Compositions, methods and systems for discovery of lipopeptides

ABSTRACT

The invention relates to isolated polypeptides involved in lipopeptide biosynthesis and polynucleotides encoding such polypeptides. In particular, the isolated polypeptide may be an acyl-specific C-domain, an adenylating enzyme, or an acyl carrier. The invention also relates to methods for detecting a polypeptide involved in lipopeptide biosynthesis or a polynucleotide encoding such a polypeptide, as well as relevant useful computer readable medium and computer systems.

RELATED APPLICATIONS

[0001] This application claims the benefit of U.S. ProvisionalApplication No. 60/342,133, filed on Dec. 26, 2001, U.S. ProvisionalApplication No. 60/372,789, filed on Apr. 17, 2002. The presentapplication is a continuation-in-part of U.S. application Ser. No.09/976,059, filed Oct. 15, 2001, and of U.S. application Ser. No.10/232,370, filed Sep. 3, 2002, which is a continuation-in-part of U.S.application Ser. No. 09/910,813 filed Jul. 24, 2001. The teachings ofthe above applications are incorporated herein by reference in theirentirety.

FIELD OF INVENTION

[0002] The invention relates to genes and proteins involved in thebiosynthesis of lipopeptides and related compounds, and to methods,systems and compositions for the discovery and engineering of newlipopeptide biosynthetic loci and new lipopeptides.

BACKGROUND

[0003] Lipopeptides are natural products that exhibit potent,broad-spectrum antibiotic activity with a high potential forbiotechnological and pharmaceutical applications as antimicrobial,antifungal, or antiviral agents. Examples include compounds such aslichenysin, fengycin, surfactin, syringomycin, serrawettin, ramoplanin,daptomycin, A54145, the “calcium-dependent antibiotic” of Streptomycescoelicolor, echinocandin, pneumocandin, aculeacin, etc. Even within agroup of relatively closely related actinomycete lipopeptide producers,lipopeptide natural products may differ in structure and can beclassified into distinct sub-groups based on their chemical features.Lipoglycopeptides are lipopeptide natural products that areglycosylated, for example, ramoplanin. Acidic lipopeptides arelipopeptide natural products that are characterized by having acidicamino acid residues incorporated in the peptide chain portion of thelipopeptide, for example, daptomycin, A54145 and the calcium dependentantibiotic of Streptomyces coelicolor.

[0004] A single microorganism may produce a mixture of relatedlipopeptides that differ in the lipid moiety that is attached to thepeptide core via a free amine, usually the N-terminal amine of thepeptide core. The lipid moiety can have a major influence on thebiological properties of lipopeptide natural products. For example, thelipopeptide antibiotic A21978C complex produced by S. roseosporuscomprises at least six related microbiologically active factors C₀, C₁,C₂, C₃, C₄, and C₅ All factors of the lipopeptide antibiotic A21978Ccomplex bear an identical 13-amino acid cyclic, acidic polypeptide core,but differ from one another in the identity of the fatty acyl group atthe terminal amino group. The biological properties, e.g., antibacterialefficacy, toxicity, solubility, etc. of the different A21978C factorsvary. One of the six factors identified as part of the A21978C complex,the A21978C factor C₀, is also known as daptomycin. Likewise, the A54145antibiotics produced by S. fradiae are a group of lipopeptides relatedto the A21978C complex. Like the A21978C complex, the A54145 antibioticscomprise at least eight microbiologically active, related factors A, A₁,B, B₁, C, D, E, and F. Each A54145 factor bears a cyclic 13-amino acid,acidic polypeptide core and a fatty acyl group attached to theN-terminal amine. The eight A54145 factors differ in the identity of theamino acid residue at positions 12 and 13 of the peptide core as well asin the identity of the fatty acyl group attached to the terminal aminogroup of the amino acid residue at position 1. There is a continuingneed for compositions, methods and systems useful in discovery oflipopeptide natural products and related compounds.

[0005] Methods for natural product discovery have faced many challenges.Discovery efforts that focus on microbial-derived natural products arehampered by difficulties in cultivating the microbes; indeed mostmicrobes have yet to be cultivated in vitro. In addition, manycultivated microorganisms are not amenable to fermentation. Furthermoremany secondary metabolites are not expressed to detectable levels underin vitro conditions. Furthermore, natural products produced under invitro conditions often vary according to the growth conditions, e.g.nutrients provided, and may not be representative of the fullbiosynthetic potential of the microorganism. Genomics-basedcompositions, methods and systems for discovering lipopeptides wouldobviate or mitigate one or more of these disadvantages.

[0006] Lipopeptides produced by micororganisms are synthesizednonribosomally on large multifunctional proteins termed nonribosomalpeptide synthetases (NRPSs) (Doekel and Marahiel, 2001, MetabolicEngineering, Vol. 3, pp. 64-77). NRPSs are modular proteins that consistof one or more polyfunctional polypeptides each of which is made up ofmodules. The amino-terminal to carboxy-terminal order and specificitiesof the individual modules correspond to the sequential order andidentity of the amino acid residues of the peptide product. Each NRPSmodule recognizes a specific amino acid substrate and catalyzes astepwise condensation to form the growing peptide chain. The identity ofthe amino acid recognized by a particular unit can be determined bycomparison with other units of known specificity (Challis and Ravel,2000, FEMS Microbiology Letters, Vol. 187, pp. 111-114). In many peptidesynthetases, there is a strict correlation between the order of repeatedunits in a peptide synthetase and the order in which the respectiveamino acids appear in the peptide product, making it possible tocorrelate peptides of known structure with putative genes encoding theirsynthesis, as demonstrated by the identification of the mycobactinbiosynthetic gene cluster from the genome of Mycobacterium tuberculosis(Quadri et al., 1998, Chem. Biol. Vol. 5, pp. 631-645).

[0007] The modules of a peptide synthetase are composed of smaller unitsor “domains” that each carry out a specific role in the recognition,activation, modification and joining of amino acid precursors to formthe peptide product. One type of domain, the adenylation (A) domain, isresponsible for selectively recognizing and activating the amino acidthat is to be incorporated by a particular unit of the peptidesynthetase. This activation step is ATP-dependent and involves thetransient formation of an amino-acyl-adenylate. The activated amino acidis covalently attached to the peptide synthetase through another type ofdomain, the thiolation (T) domain, that is generally located adjacent tothe A domain. The T domain is post-translationally modified by thecovalent attachment of a phosphopantetheinyl prosthetic arm to aconserved serine residue. The activated amino acid substrates aretethered onto the nonribosomal peptide synthetase via a thioester bondto the phosphopantetheinyl prosthetic arm of the respective T domains.Amino acids joined to successive units of the peptide synthetase aresubsequently covalently linked together by the formation of amide bondscatalyzed by another type of domain, the condensation (C) domain.

[0008] Little is known about the mechanism involved in attachment oflipid moieties to the peptide core. The literature is sparse regardingthe enzymatic mechanism or timing of addition of the acyl group tolipopeptide natural products. In particular, the enzymes involved inN-acylation of peptide natural products have not been identified, and itremains unknown whether acylation occurs prior to, concomitant with, orsubsequent to the formation of the peptide core. Doekel and Marahiel,(2001, Metabolic Engineering, 3, 64-77) reviews catalytic domains inpeptide synthetases and notes that condensation domain sequences varyaccording to the domain arrangements of NRPSs, referring to condensationdomains located C-terminal to epimerization domains, condensationdomains located C-terminal to thiolation domains, and condensationdomains involved in initiation of acyl-transfer during assembly oflipopeptides. Understanding the mechanism by which the lipid moietiesare covalently attached to the peptide core would allow for introductionof alternative fatty acyl moieties onto a given peptide core by means ofrecombinant DNA technologies, or to increase the yield of product(s)containing the desirable fatty acyl moiety or moieties by recombinantDNA technologies.

[0009] Selective feeding experiments indicate that growth nutrients canaffect the relative amounts of lipopeptide products. Growth conditionsthat favor the synthesis of one given lipid precursor willpreferentially lead to the synthesis of the corresponding lipopeptidecontaining that lipid moiety. For example, daptomycin is normallyproduced by S. roseosporus in trace amounts. A great deal of effort isrequired to generate adequate amounts of biologically pure daptomycin.Continuous feeding of fermentation cultures with caproic acid ordecanoic acid mixed 1:1 (v:v) in methyl oleate has been shown toincrease the yield of daptomycin (R. H. Baltz, Lipopeptide AntibioticsProduced by Streptomyces roseosporus and Streptomyces fradiae, in:Biotechnology of Antibiotics, Second Edition, pp. 415-435, edited by W.R. Strohl). Alternatively, a chemical process requiring enzymaticdeacylation of A21978C factors, protection of a certain reactivesidechain in the peptide portion of the compound, synthetic addition ofthe fatty acyl group, and finally deprotection to yield the desireddaptomycin product has been developed. However, these methods arecompound-specific, laborious and inefficient and highlight the need forimproved methods of producing lipopeptides and derivatives thereof.

SUMMARY OF THE INVENTION

[0010] In one aspect, the invention provides an isolated polynucleotideencoding an acyl-specific C-domain, wherein said isolated polynucleotideencodes a polypeptide which comprises at least 45% sequence identity toat least one sequence selected from SEQ ID NOS: 1 and 2. Certainembodiments expressly exclude one or more sequences, in particular thenucleotide sequence corresponding to the C-domain of NRPS protein ofGenBank accession no. CAB 38518, i.e. coordinates 195135 to 217526 ofGenbank nucleotide accession AL939115, and SEQ ID NO: 21. Otherembodiments, exclude nucleic acid sequences originating from an organismother than an organism of the actinomycetes taxon. Other sequences canbe excluded without departing from the scope of the invention. In arelated aspect the invention provides an isolated polynucleotidecomprising a sequence selected from the group consisting of: (a) asequence selected from the group consisting of SEQ ID NOS: 5, 7, 9,11,13,15, 17 and 19; (b) a sequence that is complementary to (a); (c) asequence which hybridizes to said sequence of (a) or (b) underconditions of high stringency; and (d) a sequence which has at least 70%or higher homology to said sequence of (a), (b), or (c). Certainembodiments expressly exclude one or more sequences, in particular thenucleotide sequence corresponding to the C-domain of NRPS protein ofGenBank accession no. CAB 38518, i.e. coordinates 195135 to 217526 ofGenbank nucleotide accession AL939115, and SEQ ID NO: 21. Otherembodiments, exclude nucleic acid sequences originating from an organismother than an organism of the actinomycetes taxon. Other sequences canbe excluded without departing from the scope of the invention. In oneembodiment of the invention, the acyl-specific C-domain encoded by theisolated polynucleotide is involved in lipopeptide acyl-capping. In oneembodiment the acyl-specific C-domains reside in cosmids 008CH, 184CMand 024CK having accession numbers IDAC 190901-2, IDAC 260202-1 and IDAC260202-5, respectively.

[0011] In a further embodiment, the isolated polynucleotide encoding anacyl-specific C-domain resides in a gene locus selected from the groupconsisting of the biosynthetic locus for ramoplanin from Actinoplanessp. ATCC 33076; the biosynthetic locus for A21978C from Streptomycesroseosporus NRRL 11379; the biosynthetic locus for A54145 fromStreptomyces fradiae ATCC 18158; the biosynthetic locus for thecalcium-dependent antibiotic from Streptomyces coelicolor A3(2); thebiosynthetic locus for a lipopeptide natural product from Streptomycesghanaensis NRRL B-12104; the biosynthetic locus for a lipopeptidenatural product from Streptomyces refuineus NRRL 3143; the biosyntheticlocus for a lipopeptide natural product from Streptomyces aizunensisNRRL B-11277; the biosynthetic locus for a lipopeptide natural productfrom Actinoplanes nipponensis FD 24834 ATCC 31145; and the biosyntheticlocus for a lipopeptide natural product from a Streptomyces sp.organism.

[0012] In another embodiment, the isolated polynucleotide encoding anacyl-specific C-domain does not reside in the biosynthetic locus for thecalcium-dependent antibiotic from Streptomyces coelicolor A3(2) (CADA).

[0013] The invention provides two or more isolated polynucleotides,wherein the first polynucleotide encodes a polypeptide which comprisesat least 45% sequence identity to at least one sequence selected fromSEQ ID NOS: 1 and 2, and the second polynucleotide encodes a polypeptideselected from the group consisting of a polypeptide having at least 55%sequence identity to SEQ ID NO: 3 and a polypeptide having at least 50%sequence identity to SEQ ID NO: 4. In a related aspect the inventionprovides two or more isolated polynucleotides wherein the firstpolynucleotide encodes an acyl-specific C-domain and the secondpolynucleotide encodes an adenylating enzyme, an acyl carrier protein ora fusion of an adenylating enzyme and an acyl carrier protein.

[0014] The invention also provides an isolated polynucleotide comprisinga sequence selected from the group consisting of: (a) a sequenceselected from the group consisting of SEQ ID NOs. 23, 25, 27, 29, 31,33, 35, 37, 39, 41, 43, 45 and 47; (b) a sequence that is complementaryto (a); (c) a sequence which hybridizes to said sequence of (a) or (b)under conditions of high stringency; and (d) a sequence which has atleast 70% or higher homology to said sequence of (a), (b), or (c). Inone embodiment the polynucleotide encodes a polypeptide selected fromthe group consisting of a polypeptide having at least 55% sequenceidentity to SEQ ID NO: 3. In another embodiment, the polynucleotideencodes a polypeptide having at least 50% sequence identity to SEQ IDNO: 4.

[0015] In one embodiment the polynucletide encodes an adenylatingenzyme. In another embodiment the polynucleotide encodes an acyl carrierprotein. In a further embodiment, the polynucleotide encodes a fusion ofan adenylating enzyme and an acyl carrier protein. In another embodimentthe polypeptide encoding an adenylating enzyme, an acyl carrier proteinor a fusion of the two is from a biosynthetic locus selected from thegroup consisting of the biosynthetic locus for ramoplanin fromActinoplanes sp. ATCC 33076; the biosynthetic locus for A21978C fromStreptomyces roseosporus NRRL 11379; the biosynthetic locus for A54145from Streptomyces fradiae ATCC 18158; the biosynthetic locus for alipopeptide natural product from Streptomyces ghanaensis NRRL B-12104;the biosynthetic locus for a lipopeptide natural product fromStreptomyces refuineus NRRL 3143; the biosynthetic locus for alipopeptide natural product from Streptomyces aizunensis NRRL B-11277;the biosynthetic locus for a lipopeptide natural product fromActinoplanes nipponensis FD 24834 ATCC 31145; and the biosynthetic locusfor a lipopeptide natural product from a Streptomyces sp. organism. Inone embodiment the adenylating enzyme is from cosmids 008CO and 024CKhaving accession numbers IDAC 190901-2 and IDAC 260202-5, respectively.In another embodiment the acyl carrier protein is from cosmids 008CH and024CK having accession numbers IDAC 190901-3 and IDAC 260202-5respectively. In one embodiment the fusion protein containing anadenylating enzyme and an acyl carrier protein is from cosmid 184CMhaving accession number IDAC 260202-1.

[0016] The invention also provides an isolated acyl-specific C-domaincomprising at least 45% sequence homology to at least one sequenceselected from SEQ ID NO. 1 and SEQ ID NO. 2. Certain embodimentsexpressly exclude one or more sequences, in particular the polypeptidesequence corresponding to the C-domain of NRPS protein of GenBankaccession no. CAB 38518, and SEQ ID NO: 22. Other embodiments, excludepolypeptide sequences originating from an organism other than anorganism of the actinomycetes taxon. Other sequences can be excludedwithout departing from the scope of the invention. In a related aspect,the invention provides an isolated acyl-specific C-domain comprising apolypeptide sequence selected from the group consisting of: (a) asequence selected from the group consisting of SEQ ID NOs. 6, 8, 10,12,14,16,18, 20 and 22; and (b) a sequence which has at least 70% or higherhomology to said sequence of (a). Certain embodiments expressly excludeone or more sequences, in particular the polypeptide sequencecorresponding to the C-domain of NRPS protein of GenBank accession no.CAB 38518, and SEQ ID NO: 22. Other embodiments, exclude polypeptidesequences originating from an organism other than an organism of theactinomycetes taxon. Other sequences can be excluded without departingfrom the scope of the invention.

[0017] The invention further provides two or more isolated polypeptides,wherein the first isolated polypeptide is an acyl-specific C-domaincomprising at least 45% sequence homology to at least one sequenceselected from SEQ ID NO. 1 and SEQ ID NO. 2, and the second isolatedpolypeptide is selected from the group consisting of a polypeptidehaving at least 55% identity to SEQ ID NO. 3 and a polypeptide having atleast 50% identity to SEQ ID NO. 4. In still a further aspect, theinvention provides an N-acyl-capping cassette comprising at least oneacyl-specific C-domain polypeptide and another polypeptide selected fromthe group consisting of an adenylating protein and an acyl-carrierprotein.

[0018] In one embodiment, the isolated acyl-specific C-domain is notfrom the biosynthetic locus for the calcium-dependent antibiotic fromStreptomyces coelicolor A3(2) (CADA).

[0019] The invention provides an isolated polypeptide comprising apolypeptide selected from the group consisting of: (a) SEQ ID NOs. 24,26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46 and 48; and (b) a sequencewhich has at least 70% or higher homology to said sequence of (a). Inone embodiment, such isolated polypeptide is not from the biosyntheticlocus for the calcium-dependent antibiotic from Streptomyces coelicolorA3(2) (CADA).

[0020] The invention further provides a computer readable mediumcomprising a computer program and data, comprising: (a) a computerprogram stored on said media containing instructions sufficient toimplement a process for effecting the identification, analysis, ormodeling of a representation of a polynucleotide or polypeptidesequence; (b) data stored on said media representing a sequence of apolynucleotide selected from the group consisting of: (i) apolynucleotide encoding an acyl-specific C-domain, said polynucleotideencoding a polypeptide having at least 45% sequence identity with eitherSEQ ID NO: 1 or SEQ ID NO: 2; (ii) a polynucleotide encoding apolypeptide having at least 55% sequence identity with SEQ ID NO: 3; and(iii) a polynucleotide encoding a polypeptide having at least 50%sequence identity with SEQ ID NO: 4; and (c) a data structure reflectingthe underlying organization and structure of said data to facilitatesaid computer program access to data elements corresponding to logicalsub-components of the sequence, said data structure being inherent insaid program and in the way in which said computer program organizes andaccesses said data. In a related aspect, the invention provides acomputer readable medium comprising a computer program and data,comprising: (a) a computer program stored on said media containinginstructions sufficient to implement a process for effecting theidentification, analysis, or modeling of a representation of apolypeptide sequence; (b) data stored on said media representing asequence of a polypeptide selected from the group consisting of: (i) apolypeptide representing an acyl-specific C-domain and having at least45% sequence identity with either SEQ ID NO: 1 or SEQ ID NO: 2; (ii) apolypeptide having at least 55% sequence identity with SEQ ID NO: 3; and(iii) a polypeptide having at least 50% sequence identity with SEQ IDNO: 4; and (c) a data structure reflecting the underlying organizationand structure of said data to facilitate said computer program access todata elements corresponding to logical sub-components of the sequence,said data structure being inherent in said program and in the way inwhich said computer program organizes and accesses said data.

[0021] The invention also provides a memory for storing data that can beaccessed by a computer programmed to implement a process for effectingthe identification, analysis, or modeling of a sequence of apolynucleotide or a polypeptide, said memory comprising datarepresenting a polynucleotide selected from the group consisting of: (a)a polynucleotide encoding an acyl-specific C-domain, said polynucleotideencoding a polypeptide having at least 45% sequence identity with eitherSEQ ID NO: 1 or SEQ ID NO: 2; (b) a polynucleotide encoding apolypeptide having at least 55% sequence identity with SEQ ID NO: 3; and(c) a polynucleotide encoding a polypeptide having at least 50% sequenceidentity with SEQ ID NO: 4. In a related aspect, the invention providesa memory for storing data that can be accessed by a computer programmedto implement a process for effecting the identification, analysis, ormodeling of a sequence of a polypeptide, said memory comprising datarepresenting a polypeptide selected from the group consisting of: (a) apolypeptide having at least 45% sequence identity with either SEQ ID NO:1 or SEQ ID NO: 2; (b) a polypeptide having at least 55% sequenceidentity with SEQ ID NO: 3; and (c) a polypeptide having at least 50%sequence identity with SEQ ID NO: 4.

[0022] The invention provides a method for detecting a polypeptideinvolved in lipopeptide biosynthesis or a polynucleotide encoding such apolypeptide comprising the step of identifying (a) a polypeptide havingat least 45% sequence identity to SEQ ID NO: 1 or SEQ ID NO: 2, or (b) apolynucleotide encoding a polypeptide having at least 45% sequenceidentity to SEQ ID NO: 1 or SEQ ID NO: 2, wherein said at least 45%sequence identity indicates a polypeptide involved in lipopeptidebiosynthesis. In one embodiment the method comprises the steps of: (a)providing a reference polynucleotide or polypeptide sequence selectedfrom the group consisting of a polynucleotide or polypeptide sequencesrepresenting an acyl-specific domain; (b) comparing said referencesequence to one or more candidate polynucleotide or polypeptidesequences stored on a computer readable medium; (c) determining level ofhomology between said reference sequence and said one or more candidatesequences, and (d) identifying a candidate sequence which shares atleast 70% homology with reference sequence. In one embodiment the methodfurther comprising the step of identifying, in proximity to thepolypeptide of (a) or the polynucleotide of (b), at least (c) onepolypeptide having at least 55% sequence identity to SEQ ID NO: 3 or onepolynucleotide sequence encoding a polypeptide having at least 55%sequence identity to SEQ ID NO: 3; or (d) one polypeptide having atleast 50% sequence identity to SEQ ID NO: 4 or one polynucleotidesequence encoding a polypeptide having at least 50% sequence identity toSEQ ID NO: 4. In another embodiment of the method the polypeptide of c)is a polypeptide of SEQ ID NO: 24, 26, 28, 30, 32, 34, 36, 38 or 40, ora polypeptide having at least 70% sequence identity to a polypeptide ofSEQ ID NO: 24, 26, 28, 30, 32, 34, 36, 38 or 40; or the nucleotide of(d) is a nucleotide encoding a polypeptide of SEQ ID NO: 24, 26, 28, 30,32, 34, 36, 38 or 40 or a nucleotide encoding a polypeptide having atleast 70% sequence identity to a polypeptide of SEQ ID NO: 24, 26,28,30,32, 34, 36, 38 or 40.

[0023] The invention provides a computer system comprising: (a) adatabase of reference sequences, wherein the reference sequences encodeproteins involved in lipid biosynthesis, and wherein the referencesequences include one or more of: (i) a polypeptide sequencerepresenting an acyl-specific C-domain or a polynucleotide encoding anacyl-specific C-domain; and (b) a user interface capable of: (ii)receiving a test sequence for comparing against each of the referencesequences in the database; and (iii) displaying the results of thecomparison. In one embodiment, reference sequences of the computersystem further include one or more of: (iv) a polypeptide sequencerepresenting an adenylating enzyme or a polynucleotide encoding anadenylating enzyme; and (v) a polypeptide sequence representing an acylcarrier protein or a poynucleotide encoding an acyl carrier protein. Inanother embodiment, the reference sequence of (i) is selected from SEQID NOS: 1, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,21 and 22; the reference sequence of (iv) is selected from SEQ ID NOS:3, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33 and 34; and the referencesequence of (v) is selected from SEQ ID NO: 4, 37, 38, 39, 40, 41, 42,43, 44, 45, 46, 47 and 48.

BRIEF DESCRIPTION OF THE DRAWINGS

[0024]FIGS. 1a, 1 b, 1 c, 1 d and 1 e represent schematic views of thebiosynthetic loci for: (1 a) ramoplanin from Actinoplanes sp. ATCC 33076(RAMO) and A21978C from Streptomyces roseosporus NRRL 11379 (DAPT); (1b) A54145 from Streptomyces fradiae ATCC 18158 (A541) and thelipopeptide from Streptomyces ghanaensis NRRL B-12104 (009H); (1 c) alipopeptide from Streptomyces refuineus NRRL 3143 (024A) and alipopeptide from Streptomyces aizunensis NRRL B-1 1277 (023C); (1 d) alipopeptide from Actinoplanes nipponensis FD 24834 ATCC 31145 (A410) anda putative lipopeptide natural product (070B) from organism 070 inEcopia's private culture collection; and (1 e) the calcium-dependentantibiotic from Streptomyces coelicolor A3 (CADA), showing a scale inbase pairs, and the relative position and orientation of open readingframes (ORFs) encoding representative acyl-specific C-domains of theinvention and representative adenylating enzymes and acyl carrierproteins of the invention. Deposited cosmids containing genes of theinvention are also indicated in regard to RAMO, A541 and 024A.

[0025]FIG. 2 represents a dendrogram showing the evolutionaryrelatedness of C-domains from various lipopeptide NRPSs with a clearlybranching cluster of representative C-domains of the invention involvedin N-acylation highlighted in gray.

[0026]FIGS. 3a and 3 b represent an amino acid alignment ofrepresentative acyl-specific C-domains of the invention as found in eachof the RAMO, DAPT, A541, CADA, 009H, 024A, 023C, A410 and 070Blipopeptide biosynthetic loci. Conserved motifs are highlighted. In eachof the clustal alignments a line above the alignement is used to markstrongly conserved positions. In addition, three characters, namely★(asterisk),: (colon) and . (period) are used, wherein “★” indicatespositions which have a single, fully conserved residue; “:” indicatesthat one of the following strong groups is fully conserved: STA, NEQK,NHQK, NDEQ, QHRK, MILV, MILF, HY, and FYW; and “.” Indicates that one ofthe following weaker groups is fully conserved: CSA, ATV, SAG, STNK,STPA, SGND, SNDEQK, NDEQHK, NEQHRK, FVLIM, and HFY.

[0027]FIGS. 4a, 4 b and 4 c represent an amino acid alignment ofrepresentative ADLE proteins of the invention as found in each of theRAMO, DAPT, 009H, 024A, 023C and A410 loci, together with the ADLEportion of the ADLF fusion protein from the A541 locus. Conserved motifsof acyl CoA ligases are indicated by lines on top of the sequences.

[0028]FIG. 5 is an amino acid alignment of representative ACPH proteinsof the invention from the RAMO, DAPT, 009H, 024A, 023C and A410 locitogether with the corresponding portion of the ADLF fusion protein fromthe A541 locus. The conserved serine residue of the thiolation domain towhich a phosphopantetheine group is covalently attachedpost-translationally is highlighted.

[0029]FIG. 6a is a dendrogram showing the evolutionary relatedness ofthe representative NRPS C-domains of the invention. FIG. 6b is adendrogram showing the evolutionary relatedness of the representativeADLE proteins of the invention. FIG. 6c is a dendrogram showing theevolutionary relatedness of the representative ACPH proteins of theinvention.

[0030]FIGS. 7a and 7 b illustrate a general biosynthetic scheme forformation of N-acyl peptide linkage in lipopeptides using theacyl-specific C-domain, ADLE protein and ACPH protein of the invention.

[0031]FIG. 8 illustrates the biosynthetic scheme of FIGS. 7a and 7 b asapplied to formation of the N-acyl peptide linkage in ramoplanin andA54145.

[0032]FIGS. 9a and 9 b are photographs of plates generated in thebioassay of anionic lipopeptide isolation experiments and illustratingan enrichment of activity, based on IRA67 anion exchange chromatographyof lipopeptides from Streptomyces refuineus subsp. thermotolerans andStreptomyces fradiae.

[0033]FIG. 10a and 10 b illustrate use of NRPS biosynthetic machinery ofa nonlipopeptide natural product, complestatin, to produce an N-acylatedanalogue of complestatin. FIG. 10a illustrates the biosynthesis ofcomplestatin. FIG. 10b illustrates a rationally designed recombinantNRPS system that gives rise to N-acylated complestatin analogue(s).

[0034]FIG. 11 is a block diagram of a computer system according to oneembodiment of the invention.

[0035]FIG. 12 is a flow chart representing a process performed by thecomputer system to compare candidate sequences with one or morereference sequences according to one embodiment of the invention.

[0036]FIG. 13 is a flow chart representing a process performed by thecomputer system to compare candidate sequences with one or morereference sequences and the display of comparison results according toone embodiment of the invention.

[0037]FIG. 14 is a chart the domains disclosed according to someembodiments of the invention.

DETAILED DESCRIPTION

[0038] The invention provides compositions, methods and systems usefulin the discovery and engineering of lipopeptides and related compounds.The compositions can be used in identifying lipopeptide naturalproducts, lipopeptide genes, lipopeptide gene clusters andlipopeptide-producing organisms.

[0039] Lipopeptide biosynthetic loci from a variety of organisms werediscovered and analyzed. For convenience, the lipopeptide biosyntheticloci and the organism in which the locus is found is sometimes indicatedby reference to a source designation wherein “RAMO” refers to thebiosynthetic locus for ramoplanin from Actinoplanes sp. ATCC 33076,“DAPT” refers to the biosynthetic locus for A21978C from Streptomycesroseosporus NRRL 11379, “A541” refers to the biosynthetic locus forA54145 from Streptomyces fradiae ATCC 18158, “CADA” refers to thebiosynthetic locus for the calcium-dependent antibiotic fromStreptomyces coelicolor A3(2) (Bentley et al., 2002, Nature, vol. 417,pp 141-147), “009H” refers to the biosynthetic locus for a lipopeptidenatural product from Streptomyces ghanaensis NRRL B-12104, “024A” refersto the biosynthetic locus for a lipopeptide natural product fromStreptomyces refuineus NRRL 3143, “023C” refers to a biosynthetic locusfor a lipopeptide natural product from Streptomyces aizunensis NRRL B-11277, “A410” refers to the biosynthetic locus for a lipopeptide naturalproduct from Actinoplanes nipponensis FD 24834 ATCC 31145, and “070B”refers to the biosynthetic locus for a lipopeptide natural product froma Streptomyces sp. organism in Ecopia's private culture collection.

[0040] Surprisingly, a conserved gene domain and conserved genes commonto lipopeptide biosynthetic loci have been discovered. The conserveddomain is referred to as an “acyl-specific C-domain (unusual C-domain)”which means a condensation-domain (C-domain) involved in N-acyl cappingfor lipopeptide biosynthesis. The “acyl-specific C-domain” is requiredfor the N-acyl peptide linkage found in lipopeptides between the lipidmoiety and the first amino acid residue of the peptide core.Representative examples of the acyl specific C-domains of the inventioninclude the acyl specific C-domain residing in the ramoplaninbiosynthetic locus from Actinoplanes sp. ATCC 33076 (SEQ ID NO: 6), theacyl-specific C-domain residing in the A21978C locus in Streptomycesroseosporus NRRL 11379 (SEQ ID NO: 8), the acyl specific C-domainresiding in the A54145 locus in Streptomyces fradiae ATCC 18158 (SEQ IDNO: 10), the acyl-specific C-domain residing in a lipopeptidebiosynthetic locus in Streptomyces ghanaensis NRRL B-12104 (SEQ ID NO:12), the acyl-specific C-domain residing in a lipopeptide biosyntheticlocus in Streptomyces refuineus NRRL 3143 (SEQ ID NO: 14), theacyl-specific C-domain residing in a lipopeptide biosynthetic locus inStreptomyces aizunensis NRRL B-11277 (SEQ ID NO: 16), the acyl-specificC-domain residing in the A41,012 lipopeptide biosynthetic locus inActinoplanes nipponensis FD 24834 ATCC 31145 (SEQ ID NO: 18), theacyl-specific C-domain residing in a putative lipopeptide biosyntheticlocus from the Streptomyces sp. organism 070 in Ecopia's private culturecollection (SEQ ID NO: 20) and the acyl-specific C-domain residing inthe biosynthetic locus for the calcium-dependent antibiotic from theStreptomyces coelicolor A3(2) (SEQ ID NO: 22). Certain embodimentsexpressly exclude the acyl-specific C-domain residing in the calciumdependent antibiotic biosynthetic locus from the Streptomyces coelicolorA3(2) (SEQ ID NO: 22 and the polypeptide sequence corresponding to theC-domain of NRPS protein of GenBank accession no. CAB 38518). Otherembodiments, exclude polypeptide sequences originating from an organismother than an organism of the actinomycetes taxon.

[0041] An “acyl-specific C-domain” of the present invention is definedstructurally as a polypeptide sequence that produces an alignment withat least 45% identity to one of the two following consensus sequencesusing the BLASTP 2.0.10 algorithm (with the filter option -F set tofalse, the gap opening penalty -G set to 11, the gap extension penalty-E set to 1, and all remaining options set to defaultvalues): >Consensus sequence 1GgIReLmAgQLAvWhAqQLaPenPvYnvGEYveidGevDIdLLvaAvrrvmeEadaaRLRfrevDgvPRQYfaedeDypveViDvSaeaDPrAAAeSIMaaDLrRprDlrdgeLytqkiykvgedlvfWYqRahHiiIDGrSaGIVaSRVAaVYsALaaGgdveegALPsssVLmdAedeYraSeefeIDReYWreaLAgIPeevslganePsrlprepvRheedvsdaaAaeLraaARRLgTSIAgIaiAAAAIYqHrITGqrDVvvgVPVaGRsktaeIdiPGMTaNVvPvRIAVaPkttVaeLvrqvaRGVrdGLRHQRYrYediIdDIkLvgrdgLypIIVNvISfDydLrFGdAvsvahgLSagpvddvsIdvYdrsSdGsmkvvvdvNPDItdrsdadEvarkFIaIIrWLaesdAeepVaridLIded >Consensus sequence 2svRhgvtaAQrgvWvAQQLrpdsrIYnCGIyLeIdgaIDpavLsrAvRrtIaeTEALRsrFeedddGaIIqrvIapaPdeqtrIIeDGvPYtPvLLRHiDIsgddDPeaAArrWMDadIAePvdLdragtsrHaLItLGgdRhLIYIgYHHiaLDGfGaaLYIdRIAaVYrALrtGrePppcpFgpLdrIvaeeaaYrdSaRhrrDrayWtgrfadIpEPvgLagraAaAapapLRrtvrLpperTaaLaaaAeatGsrWpavviAAVAAFIrRIagaeeVVvgLPVTARvTrAAIrTPGMLaNvIPLRLeVrqgasfAaLIeetsraIsaILRHQRFRGEdLgReLGIaGerAgIapttVNVMaFapvIdFGdcrAvvHqLSsGPVeDLaInIyGTPgtGdeIrvtvaANPalYtaddVaslqeRLvRfLaalgaDPaapvGrvrLLdp a

[0042] where consensus sequence 1 is based on the sequences of theacyl-specific condensation domains from the calcium-dependent antibiotic(CADA) locus in Streptomyces coelicolor A3(2) (GenBank accession numbersCAB38517, CAB38518; CAB38516 and CAB38876), A21978C (DAPT) locus inStreptomyces roseosporus (NRRL 11379), A54145 locus in Streptomycesfradiae (ATCC 18158), A410 locus from an Actinoplanes nipponensis, 009Hlocus from Streptomyces ghanaensis (NRRL B-12104), and 024A locus inStreptomyces refuineus (NRRL 3143); and where consensus sequence 2 isbased on the sequences of the acyl specific condensation domains fromthe ramoplanin (RAMO) locus (Actinoplanes sp. ATCC 33076), 023C locusfrom Streptomyces aizunensis (NRRL B-11277), and 070B, a putativelipopeptide locus found from Ecopia's private culture collection.

[0043] The consensus sequences were generated as follows. First, thelisted sequences were aligned with the ClustaIX 1.81 program usingdefault settings. Then a profile hidden Markov model (HMM) was made fromthe alignment file with the hmmbuild program of the HMMER 2.2 package(Sean Eddy, Washington University; world-wide-web hmmer.wustl.edu/) andwas calibrated with the hmmcalibrate program of the HMMER package, bothusing default settings. Briefly, a profile hidden Markov model is astatistical description of a sequence family's consensus. HMMER is afreely distributable implementation of profile HMM software for proteinsequence analysis and is available from the above web site. Finally, theconsensus sequences were generated from the HMM with the hmmemit programof the HMMER package using the -c option so as to predict a singlemajority rule consensus sequence from the HMM's probabilitydistribution. Highly conserved amino acid residues (p>=0.5) are shown inupper case in the consensus sequence, others are shown in lower case.

[0044] A “polynucleotide encoding an acyl-specific condensation domain(C-domain)” refers to a polynucleotide encoding an acyl-specificC-domain. Representative examples of a polynucleotide encoding an acylspecific C-domain of the invention include the polynucleotide encodingthe acyl specific C-domain residing in the ramoplanin biosynthetic locusfrom Actinoplanes sp. ATCC 33076 (SEQ ID NO: 5), the polynucleotideencoding the acyl-specific C-domain residing in the A21978C locus inStreptomyces roseosporus NRRL 11379 (SEQ ID NO: 7), the polynucleotideencoding the acyl specific C-domain residing in the A54145 locus inStreptomyces fradiae ATCC 18158 (SEQ ID NO: 9), the polynucleotideencoding the acyl-specific C-domain residing in a lipopeptidebiosynthetic locus in Streptomyces ghanaensis NRRL B-12104 (SEQ ID NO:11), the polynucleotide encoding the acyl-specific C-domain residing ina lipopeptide biosynthetic locus in Streptomyces refuineus NRRL 3143(SEQ ID NO: 13), the polynucleotide encoding the acyl-specific C-domainresiding in a lipopeptide biosynthetic locus in Streptomyces aizunensisNRRL B-11277 (SEQ ID NO: 15), the polynucleotide encoding theacyl-specific C-domain residing in a lipopeptide biosynthetic locus inActinoplanes nipponensis FD 24834 ATCC 31145 (SEQ ID NO: 17), thepolynucleotide encoding the acyl-specific C-domain residing in abiosynthetic locus of a Streptomyces sp. in Ecopia's private culturecollection (SEQ ID NO: 19), and the polynucleotide encoding theacyl-specific C-domain residing in the calcium dependent antibioticbiosynthetic locus from the Streptomyces coelicolor A3(2) (SEQ ID NO:21). Certain embodiments expressly exclude polynucleotides encoding theacyl-specific C-domain residing in the calcium dependent antibioticbiosynthetic locus from the Streptomyces coelicolor A3(2) (SEQ ID NO: 21and nucleotide sequences encoding the polypeptide sequence of theC-domain of NRPS protein of GenBank accession no. CAB 38518, i.e.coordinates 195135 to 217526 of nucleotide accession AL939115 representthe nucleotide sequence of the NRPS of CAB38518). Other embodiments,exclude polypeptide sequences originating from an organism other than anorganism of the actinomycetes taxon.

[0045] The acyl-specific C-domains of SEQ ID NOS: 6, 8, 10,12,14, 16,18and 20 were compared using the BLASTP algorithm with the defaultparameters to the sequences of the National Center for BiotechnologyInformation (NCBI) nonredundant protein database and to sequences of theDECIPHER® database of microbial genes, pathways and natural products(Ecopia BioSciences Inc., St-Laurent, Canada). The accession numbers ofthe top GenBank hits of this BLAST analysis are presented in Table 1along with the corresponding E values. The E value assists in thedetermination of whether two sequences display sufficient similarity tojustify an inference of homology. The E value relates the expectednumber of chance alignments with an alignment score at least equal tothe observed alignment score. An E value of 0.00 indicates a perfecthomolog. The E-values are calculated as described in Altschul et al.1990, J. Mol. Biol. 215(3):403-410; Gish et al., 1993, Nature Genetics3:266-272. TABLE 1 GenBank prob- % % proposed function of GenBank SEQ idFamily Locus #aa homology ability identity similarity match 6 NRPS- RAMO435aa AAD56240. 6e−46 111/391 167/391 DhbF, Bacillus subtilis Cdom1,2378aa (28.39%) (42.71%) BAB69380. 1e−44 126/436 177/436 non-ribosomalpeptide 1,1440aa (28.9%)  (40.6%)  synthetase,Streptomyces avermitilisT30288, 3e−42 128/429 170/429 pristinamycin I synthase 2,Streptomyces2591aa (29.84%) (39.63%) pristinaespiralis 8 NRPS- DAPT 435aa Np_627443.6e−64 160/451 214/451 CDA peptide synthetase I,Streptomyces Cdom1,7463aa (35.48%) (47.45%) coelicolor AAD56240. 2e−57 133/430 199/430DhbF,Bacillus subtilis 1,2378aa (30.93%) (46.28%) T14591, 3e−57 146/432196/432 actinomycin synthetase II,Streptomyces 2611aa (33.8%)  (45.37%)chrysomallus 10 NRPS- A541 453aa Np_627443. 9e−48 150/462 187/462 CDApeptide synthetase I,Streptomyces Cdom 1,7463aa (32.47%) (40.48%)coelicolor AAD56240. 1e−45 131/447 190/447 DhbF,Bacillus subtilis1,2378aa (29.31%) (42.51%) T14591, 5e−40 138/452 186/452 actinomycinsynthetase II,Streptomyces 2611aa (30.53%) (41.15%) chrysomallus 12NRPS- 009H 432aa NP_627443. 1e−53 148/452 195/452 CDA peptide synthetaseI,Streptomyces Cdom 1,17463aa (32.74%) (43.14%) coelicolor BAB69380.1e−53 140/431 198/431 non-ribosomal peptide 1,1440aa (32.48%) (45.94%)synthetase,Streptomyces avermitilis AAD56240. 8e−53 133/434 201/434DhbF,Bacillus subtilis 1,2378aa (30.65%) (46.31%) NRPS- 024A 438aaNP_627443. 4e−53 143/414 190/414 CDA peptide synthetase I,StreptomycesCdom 1,17463aa (34.54%) (45.89%) coelicolor AAD56240. 6e−53 135/430202/430 DhbF,Bacillus subtilis 1,2378aa (31.4%) (46.98%) BAB69380. 7e−47137/405 179/405 non-ribosomal peptide 1,1440aa (33.83%) (44.2%) synthetase,Streptomyces avermitilis 16 NRPS- 023C 432aa AAD56240. 6e−54125/393 202/393 DhbF,Bacillus subtilis Cdom 1,2378aa (31.81%) (51.4%) BAB69380. 3e−46 124/400 180/400 non-ribosomal peptide 1,1440aa (31%)   (45%)    synthetase,Streptomyces avermitilis AAG31130. 1e−44 128/436196/436 MxcG,Stigmatella aurantiaca 1,1456aa (29.36%) (44.95%) 18 NRPS-A410 431aa AAD56240. 1e−68 149/428 213/428 DhbF,Bacillus subtilis Cdom1,2378aa (34.81%) (49.77%) AAG31130. 5e−56 141/424 194/424MxcG,Stigmatella aurantiaca 1,1456aa (33.25%) (45.75%) NP_627443. 1e−55144/446 199/446 CDA peptide synthetase I,Streptomyces 1,17463aa (32.29%)(44.62%) coelicolor 20 NRPS- 070B 435aa AAD56240. 7e−86 168/423 239/423DhbF,Bacillus subtilis Cdom 1,2378aa (39.72%) (56.5%)  AAG31130. 5e−83174/415 237/415 MxcG,Stigmatella aurantiaca 1,1456aa (41.93%) (57.11%)BAB69380. 9e−71 155/406 215/406 non-ribosomal peptide 1,1440aa (38.18%)(52.96%) synthetase,Streptomyces avermitilis 22 NRPS- CADA 446aaNP_627443. 0.0 446/446 446/446 CDA peptide synthetase I,StreptomycesCdom 1,7463aa (100%)   (100%)   coelicolor AAD56240. 2e−70 159/450236/450 DhbF,Bacillus subtilis 1,2378aa (35.33%) (52.44%) T14591, 6e−67171/448 224/448 actinomycin synthetase,Streptomyces 2611aa (38.17%)(50%)    chrysomallus

[0046] As used herein, the term “adenylating enzyme” or ADLE, meansmember of a family of proteins involved in N-acyl capping forlipopeptide biosynthesis. Representative adenylating enzymes of theinvention include the adenylating enzyme residing in the ramoplaninbiosynthetic locus from Actinoplanes sp. ATCC 33076 (SEQ ID NO: 22), theadenylating enzyme residing in the A21978C locus in Streptomycesroseosporus NRRL 11379 (SEQ ID NO: 24), the adenylating enzyme residingin a lipopeptide biosynthetic locus in Streptomyces ghanaensis NRRLB-12104 (SEQ ID NO: 26), the adenylating enzyme residing in alipopeptide biosynthetic locus in Streptomyces refuineus NRRL 3143 (SEQID NO: 28), the adenylating enzyme residing in a lipopeptidebiosynthetic locus in Streptomyces aizunensis NRRL B-11277 (SEQ ID NO:30), and the adenylating enzyme residing in a lipopeptide biosyntheticlocus in Actinoplanes nipponensis FD 24834 ATCC 31145 (SEQ ID NO: 32).The adenylating enzyme may be a portion of a fusion protein, forexample, the adenylating enzyme residing in the A54145 locus inStreptomyces fradiae ATCC 18158 is residues 1 to 648 of a fusion proteindesignated ADLF (SEQ ID NO: 34).

[0047] The adenylating enzyme is defined structurally as a polypeptidesequence that produces an alignment with at least 55% identity to thefollowing consensus sequence using the BLASTP 2.0.10 algorithm (with thefilter option -F set to false, the gap opening penalty -G set to 11, thegap extension penalty -E set to 1, and all remaining options set todefault values): >Consensus sequence 3vsavmvdIaagpsvpaaLRahAearPdRtAvvfVrDtdradgtasLsYaeLDrrARavAvwLrarlapGdRvLLLhPaGpeFvaAyLgCLYAGIvAVPAPLPGgysherrRVvgIAaDagagaVLTdadteAeVreWIaEtGLpgLPVIAvDpIAadgDPgaWrpPgIradtVAvLQYTSGSTGsPKGVvVTHgNLLaNarsLsrsfgItedtvfGGWLPIyHDMGLfGILIPaLfIGatvVLMSPsAFIrRPhIWLrIIDRfgvvfSAAPDFAYDLCvRRVtDEQiAgLDLSRWRwAaNGSEPIrAaTIRaFaeRFApAGLRpeaLtPCYGLAEATIfVSgkSagPIrtrrVDpaaLEdHrfeeAvpGrpaREiVsCGrvpdIevRIVDPgtgrpLPdGaVGEIwLRGpSVaaGYWgrpEataetFgavtDGgDGPwLRTGDLGALYeGELYVTGRiKEILiVhGRNIYPhDiEhELRAaHdELagavGAaFaVpapGgGeEvIVVvHEVrprvpaDeIpaLAsAmRaTvaREFGvpaagVvLvRRGTVrRTTSGKvQRramReLFItGeLapvHaelgphlqaaaagearaatSIAPa Stv

[0048] where consensus sequence 3 is based on the ADLE polypeptidesequences of the DAPT, A410, 009H, 024A, RAMO and 023C lipopeptide locias described herein above and residues 1 to 648 of the ADLF (as definedhereinafter) polypeptide sequence of the A541 lipopeptide locus.Consensus sequence 3 was generated as described above in relation toconsensus sequences 1 and 2.

[0049] A “polynucleotide encoding an adenylating enzyme” or a“polynucleotide encoding ADLE” refers to a polynucleotide encoding amember of the ADLE family of proteins involved in N-acyl capping forlipopeptide biosynthesis. Representative polynucleotides encodingadenylating enzymes of the invention include the polynucleotide encodingthe adenylating enzyme residing in the ramoplanin biosynthetic locusfrom Actinoplanes sp. ATCC 33076 (SEQ ID NO: 21), the polynucleotideencoding the adenylating enzyme residing in the A21978C locus inStreptomyces roseosporus NRRL 11379 (SEQ ID NO: 23), the polynucleotideencoding the adenylating enzyme residing in a lipopeptide biosyntheticlocus in Streptomyces ghanaensis NRRL B-12104 (SEQ ID NO: 25), thepolynucleotide encoding the adenylating enzyme residing in a lipopeptidebiosynthetic locus in Streptomyces refuineus NRRL 3143 (SEQ ID NO: 27),the polynucleotide encoding the adenylating enzyme residing in alipopeptide biosynthetic locus in Streptomyces aizunensis NRRL B-11277(SEQ ID NO: 29), and the polynucleotide encoding the adenylating enzymeresiding in a lipopeptide biosynthetic locus in Actinoplanes nipponensisFD 24834 ATCC 31145 (SEQ ID NO: 31). The nucleotide encoding anadenylating enzyme may be a portion of a gene encoding a fusion protein,for example, the nucleotide encoding the adenylating enzyme residing inthe A54145 locus in Streptomyces fradiae ATCC 18158 is residues 1 to1944 of the nucleotide encoding a fusion protein designated ADLF (SEQ IDNO: 33). The ADLE portion of the ADLF fusion protein is sometimesdesignated with an asterisk “★” in the figures.

[0050] The ADLE polypeptides of SEQ ID NOS: 24, 26, 28, 30, 31, 32, andresidues 1 to 648 of SEQ ID NO 34, i.e. the portion of the ADLF fusionprotein representing an ADLE protein, were compared using the BLASTPalgorithm with the default parameters to the sequences of the NationalCenter for Biotechnology Information (NCBI) nonredundant proteindatabase and to sequences of the DECIPHER® database of microbial genes,pathways and natural products (Ecopia BioSciences Inc., St-Laurent,Canada). The accession numbers of the top GenBank hits of this BLASTanalysis are presented in Table 2 along with the corresponding E values.TABLE 2 GenBank prob- % % proposed function of GenBank SEQ id FamilyLocus #aa homology ability identity similarity match 24  ADLE RAMO 584aaAAG02359. 1e−95  235/572 291/572 peptide synthetase, Streptomyces1,2675aa (41.08%) (50.87%) verticillus BAB69270. 2e−92  224/563 279/563non-ribosomal peptide synthetase/ 1,1261aa (39.79%) (49.56%) acyl-CoAdehydrogenase fusion, Streptomyces avermiltilis T18551, 2e−79  199/576273/576 saframycin Mx1 synthetase 1770aa (34.55%) (47.4%) B, Myxococcusxanthus 26  ADLE DAPT 600aa BAB69270. 1e−92  206/477 263/477non-ribosomal peptide synthetase/ 1,1261aa (43.19%) (55.14%) acyl-CoAdehydrogenase fusion, Streptomyces avermitilis T18551, 2e−87  201/540273/540 saframycin Mx1 synthetase 1770aa (37.22%) (50.56%) B, Myxococcusxanthus AAG02359. 5e−81  194/534 263/534 peptide synthetase,Streptomyces 1,2675aa (36.33%) (49.25%) verticillus 28  ADLE 009H 594aaBAB69270. 1e−115 244/577 316/577 non-ribosomal peptide synthetase/1,1261aa (42.29%) (54.77%) acyl-CoA dehydrogenase fusion, Streptomycesaverniltilis NP_630013. 3e−92  225/575 296/575 polyketide synthase,Streptomyces 1,2297aa (39.13%) (51.48%) coelicolor T18551. 3e−91 210/565 296/565 saframycin Mx1 synthetase 1770aa (37.17%) (52.39%) B,Myxococcus xanthus 30  ADLE 024A 601aa BAB69270. 1e−112 257/594 322/594non-ribosomal peptide synthetase/ 1,1261aa (43.27%) (54.21%) acyl-CoAdehydrogenase fusion, Streptomyces avermitilis NP_630013. 1e−103 261/640338/640 polyketide synthase, Streptomyces 1,2297aa (40.78%) (52.81%)coelicolor AAN85512. 1e−102 242/578 307/578 non ribosomal peptide1,1745aa (41.87%) (53.11%) synthetase, Streptomyces atroolivaceus 32 ADLE 023C 580aa BAB69270. 1e−136 266/572 333/572 non-ribosomal peptidesynthetase/ 1,1261aa (46.5%)  (58.22%) acyl-CoA dehydrogenase fusion,Streptomyces avermitilis ZP_00110248. 1e−112 226/560 318/560hypothetical protein, Nostoc 1,1204aa (40.36%) (56.79%) punctiformeBAB69218. 1e−109 230/572 314/572 non-ribosomal peptide 1,574aa (40.21%)(54.9%) synthetase, Streptomyces avermitilis 34  ADLE A410 588aaBAB69270. 1e−117 251/596 321/596 non-ribosomal peptide synthetase/1,1261aa (42.11%) (53.86%) acyl-CoA dehydrogenase fusion, Streptomycesavermitilis AAG02359. 1e−103 233/588 315/588 peptide synthetase,Streptomyces 1,2675aa (39.63%) (53.57%) verticillus ZP_00124327. 1e−102238/570 318/570 hypothetical protein, Pseudomonas 1,4339aa (41.75%)(55.79%) syringae 36  ADLF A541 723aa BAB69270. 1e−108 261/629 320/629non-ribosomal peptide synthetase/ 1,1261aa (41.49%) (50.87%) acyl-CoAdehydrogenase fusion, Streptomyces avermiltilis NP_251114. 1e−101251/679 346/679 non-ribosomal peptide 1,4342aa (36.97%) (50.96%)synthetase, Pseudomonas aeruginosa AAN85512. 1e−101 261/684 344/684 nonribosomal peptide 1,1745aa (38.16%) (50.29%) synthetase, Streptomycesatroolivaceus 36* ADLE A541 608aa BAB69270. 1e−108 254/594 311/594non-ribosomal peptide synthetase/ domain 1,1261aa (42.76%) (52.36%)acyl-CoA dehydrogenase fusion, Streptomyces avermitilis of AAN85512.9e−99  238/581 309/581 non-ribosmal peptide A541_ADL 1,1745aa (40.96%)(53.18%) synthetase, Streptomyces F atroolivaceus NP_251114. 6e−98 231/591 313/591 non-ribosomal peptide 1,4342aa (39.09%) (52.96%)synthetase, Pseudomonas aeruginosa

[0051] As used herein, the term acyl carrier protein or ACPH refers to amember of a family of proteins involved in N-acyl capping forlipopeptide biosynthesis. Representative acyl carrier proteins of theinvention include the acyl carrier protein residing in the ramoplaninbiosynthetic locus from Actinoplanes sp. ATCC 33076 (SEQ ID NO: 36), theacyl carrier protein residing in the A21978C locus in Streptomycesroseosporus NRRL 11379 (SEQ ID NO: 38), the acyl carrier proteinresiding in a lipopeptide biosynthetic locus in Streptomyces ghanaensisNRRL B-12104 (SEQ ID NO: 40), the acyl carrier protein residing in alipopeptide biosynthetic locus in Streptomyces refuineus NRRL 3143 (SEQID NO: 42), the acyl carrier protein residing in a lipopeptidebiosynthetic locus in Streptomyces aizunensis NRRL B-1 1277 (SEQ ID NO:44), and the acyl carrier protein residing in a lipopeptide biosyntheticlocus in Actinoplanes nipponensis FD 24834 ATCC 31145 (SEQ ID NO: 46).The acyl carrier protein may be a portion of a fusion protein, forexample, the acyl carrier protein residing in the A54145 locus inStreptomyces fradiae ATCC 18158 is residues 649 to 743 of a fusionprotein designated ADLF (SEQ ID NO: 34). The ACPH portion of the ADLFfusion protein is sometimes designated with a double asterisk “★★” inthe figures.

[0052] The acyl carrier protein (ACPH) of the invention is definedstructurally as a polypeptide sequence that produces an alignment withat least 50% identity to the following consensus sequence using theBLASTP 2.0.10 algorithm (with the filter option -F set to false, the gapopening penalty -G set to 11, the gap extension penalty -E set to 1, andall remaining options set to default values): >Consensus sequence 4MsdItappArhTPeelRaWLrecvAdyVgIppaeIatDvPLtdYGLDSVyalaLCAeiEDhIGievdptLLWDhPTIdeLsaaLaPrlarr

[0053] where consensus sequence 4 is based on the ACPH polypeptidesequences of the DAPT, A410, 009H, 024A, RAMO, 023C lipopeptide loci andresidues 649 to 743 of the ADLF polypeptide sequence of the A541lipopeptide locus. A “polynucleotide encoding an ACPH” is defined as anucleotide sequence encoding an acyl carrier protein as defined above.Consensus sequence 4 was generated as described above in relation toconsensus sequences 1 and 2.

[0054] A “polynucleotide encoding an acyl carrier protein” or a“polynucleotide encoding ACPH” refers to a polynucleotide encoding amember of the ACPH family of proteins involved in N-acyl capping forlipopeptide biosynthesis. Representative polynucleotides encoding acylcarrier proteins of the invention include the polynucleotide encodingthe acyl carrier protein residing in the ramoplanin biosynthetic locusfrom Actinoplanes sp. ATCC 33076 (SEQ ID NO: 35), the polynucleotideencoding an acyl carrier protein residing in the A21978C locus inStreptomyces roseosporus NRRL 11379 (SEQ ID NO: 37), the polynucleotideencoding the acyl carrier protein residing in a lipopeptide biosyntheticlocus in Streptomyces ghanaensis NRRL B-12104 (SEQ ID NO: 39), thepolynucleotide encoding an acyl carrier protein residing in alipopeptide biosynthetic locus in Streptomyces refuineus NRRL 3143 (SEQID NO: 41), the polynucleotide encoding an acyl carrier protein residingin a lipopeptide biosynthetic locus in Streptomyces aizunensis NRRL B-11277 (SEQ ID NO: 43), and the polynucleotide encoding the acyl carrierprotein residing in a lipopeptide biosynthetic locus in Actinoplanesnipponensis FD 24834 ATCC 31145 (SEQ ID NO: 45). The polynucleotideencoding an acyl carrier protein may be a portion of a gene encoding afusion protein, for example, the polynucleotide encoding the acylcarrier protein residing in the A54145 locus in Streptomyces fradiaeATCC 18158 is residues 1945 to 2229 of the polynucleotide encodingfusion protein designated ADLF (SEQ ID NO: 33).

[0055] The ACPH polypeptides of SEQ ID NOS: 36, 38, 40, 42, 44 and 46,and residues 649 to 743 of SEQ ID NO: 34, i.e. the ACPH portion of theADLF fusion protein, were compared using the BLASTP algorithm with thedefault parameters to the sequences of the National Center forBiotechnology Information (NCBI) nonredundant protein database and tosequences of the DECIPHER® database of microbial genes, pathways andnatural products (Ecopia BioSciences Inc., St-Laurent, Canada). Theaccession numbers of the top GenBank hits of this BLAST analysis arepresented in Table 3 along with the corresponding E values. TABLE 3GenBank prob- % % proposed function of GenBank SEQ id Family Locus #aahomology ability identity similarity match 36** ACPH A541 61aa AAF62883.5e−04 23/58 30/58 epoD, Polyangium cellulosum domain 1,7257aa (39.66%)(51.72%) of AAF26920. 7e−04 22/58 29/58 polyketide synthase, Polyangium541_ADLF 1,1832aa (37.93%) (50%)    cellulosum NP_488099. 0.002 18/5128/51 unknown protein, Nostoc sp. 1,126aa (35.29%) (54.9%) 38   ACPHRAMO 90aa BAB69272. 1e−17 41/82 56/82 hypothetical protein, 1,88aa(50%)    (68.29%) Streptomyces avermitilis ZP_00107884. 2e−09 28/7649/76 hypothetical protein, 1,105aa (36.84%) (64.47%) Nostoc punctiformeNP_488099. 2e−08 27/75 44/75 unknown protein, Nostoc sp. 1,126aa(36%)    (58.67%) 40   ACPH DAPT 89aa BAB69272. 8e−10 28/81 47.81hypothetical protein, 1,88aa (34.57%) (58.02%) Streptomyces avermitilisNP_488099. 6e−07 26/82 41/82 unknown protein, Nostoc sp. 1,126aa(31.71%) (50%)    ZP_00003083. 9e−06 26/84 48/84 hypothetical protein,1,2544aa (30.95%) (57.14%) Nitrosomonas europaea 42   ACPH 009H 90aaBAB69272. 1e−14 39/86 53/86 hypothetical protein, 1,88aa (45.35%)(61.63%) Streptomyces avermitilis ZP_00107884. 1e−09 31/74 45/74hypothetical protein, 1,105aa (41.89%) (60.81%) Nostoc punctiformeAAA03658. 4e−09 31/78 44/78 polyketide synthase, 1,506aa (39.74%)(56.41%) Anabaena sp. 44   ACPH 024A 99aa BAB69272. 4e−09 29/77 43/77hypothetical protein, 1,88aa (37.66%) (55.84%) Streptomyces avermitilisNP_488099. 2e−05 24/73 37/73 unknown protein, Nostoc sp. 1,126aa(32.88%) (50.68%) AAF62883. 2e−04 24/71 37/71 epoD, Polyangiumcellulosum 1,7257aa (33.8%)  (52.11%) 46   ACPH 023C 91aa BAB69272.3e−18 42/75 53/75 hypothetical protein, 1,88aa (56%)    (70.67%)Streptomyces avermitilis ZP_00107884. 2e−09 27/71 44/71 hypotheticalprotein, 1,105aa (38.03%) (61.97%) Nostoc punctiforme BAB69217. 3e−0726/69 43/69 polyketide synthase, 1,4809aa (37.68%) (62.32%) Streptomycesavermitilis 48   ACPH A410 88aa BAB69272. 8e−18 41/81 57/81 hypotheticalprotein, 1,88aa (50.62%) (70.37%) Streptomyces avermitilis NP_488099.6e−12 34/75 48/75 unknown protein, Nostoc sp. 1,126aa (45.33%) (64%)   BAB69217. 3e−08 29/72 43/72 polyketide synthase, 1,4809aa (40.28%)(59.72%) Streptomyces avermitilis

[0056] As used herein, the term “ADLF” refers to a single open readingframe located in the A54145 locus (SEQ ID NO: 33), where the single openreading frame is formed by the genes encoding the ADLE and ACPH proteinsfused together. The gene product of the open reading frame of SEQ ID NO:33 is provided in SEQ ID NO: 34 wherein residues 1 to 648 of SEQ ID NO:34 represent an ADLE protein and residues 649 to 743 of SEQ ID NO: 34represent an ACPH protein. It is expected that a similar fusion of ADLEand ACPH homologues may occur in other lipopeptide biosynthetic loci. Itis also expected that other permutations of fusion proteins involvingprotein families of the invention may be found in lipopeptide loci, forexample a fusion of ADLE and ACPH and the acyl-specific C-domain or afusion of ACPH and the acyl specific C-domain.

[0057] Cosmid clones containing genes and proteins of the invention havebeen deposited with the International Depositary Authority of Canada,Bureau of Microbiology, Health Canada, 1015 Arlington Street, Winnipeg,Manitoba, Canada R3E 3R2 under the terms of the Budapest Treaty on theInternational Recognition of the Deposit of Microorganisms for Purposesof Patent Procedure. An E. coli DH10B strain harboring cosmid clone008CH containing the ACPH gene and the acyl-specific C-domain in thebiosynthetic locus for ramoplanin from Actinoplanes sp. ATCC 33076 wasdeposited on Sep. 19, 2001 and assigned accession number IDAC 190901-3.An E. coli DH10B strain harboring cosmid clone 008CO containing the ADLEgene in the biosynthetic locus for ramoplanin from Actinoplanes sp. ATCC33076 was deposited on Sep. 19, 2001 and assigned accession number IDAC190901-2. An E. coli DH10B strain harboring cosmid clone 024CKcontaining the ADLE and ACPH gene and the acyl-specific C-domain in thebiosynthetic locus for the lipopeptide from Streptomyces refuineussubsp. thermotolerans was deposited on Feb. 26, 2002 and assignedaccession number IDAC 260202-5. An E. coli DH10B strain harboring cosmidclone 184CM containing the ADLF fusion protein and the acyl-specificC-domain in the biosynthetic locus for A54145 lipopeptide fromStreptomyces fradiae was deposited on Feb. 26, 2002 and assignedaccession number IDAC 260202-1. The E. coli strain deposits are referredto herein as “the deposited strains”.

[0058] The sequences of the nucleotides encoding members of the proteinfamilies ADLE, ADLF, ACPH and the acyl specific C-domains of theinvention present in the deposited strains as well as the amino acidsequences of the corresponding polypeptides are controlling in the eventof any conflict with any description of sequences herein. A license maybe required to make, use or sell the deposited strains, nucleic acidstherein or compounds derived therefrom, and no such license is herebygranted.

[0059] As used herein, the term “a polypeptide involved in lipopeptidesynthesis” refers to any polypeptide as defined herein as anacyl-specific C-domain, or an adenylating enzyme, or an acyl carrierprotein. A “polynucleotide involved in lipopeptide synthesis” refers toa nucleotide sequence encoding a polypeptide involved in lipopeptidesynthesis as defined herein.

[0060] The term “isolated” means that the material is removed from itsoriginal environment, e.g. the natural environment if it isnaturally-occurring. For example, a naturally-occurring polynucleotideor polypeptide present in a living organism is not isolated, but thesame polynucleotide or polypeptide, separated from some or all of thecoexisting materials in the natural system, is isolated. Suchpolynucleotides could be part of a vector and/or such polynucleotides orpolypeptides could be part of a composition, and still be isolated inthat such vector or composition is not part of its natural environment.

[0061] As used herein, “a condition of high stringency” refers to anyone of the hybridization conditions described herein, and include other“high stringency” conditions known in the art. In one condition, apolymer membrane containing immobilized denatured nucleic acids is firstprehybridized for 30 minutes at 45° C. in a solution consisting of 0.9 MNaCl, 50 mM NaH₂PO₄, pH 7.0, 5.0 mM Na₂EDTA, 0.5% SDS, 10X Denhardt's,and 0.5 mg/ml polyriboadenylic acid. Approximately 2×10⁷ cpm (specificactivity 4-9×10⁸ cpm/ug) of ³²P end-labeled oligonucleotide probe arethen added to the solution. After 12-16 hours of incubation, themembrane is washed for 30 minutes at room temperature in 1×SET (150 mMNaCl, 20 mM Tris hydrochloride, pH 7.8, 1 mM Na₂EDTA) containing 0.5%SDS, followed by a 30 minute wash in fresh 1×SET at Tm-10° C. for theoligonucleotide probe, where Tm is the melting temperature of the probe.Stringency may be varied by conducting the hybridization at varyingtemperatures below the melting temperatures of the probes. The meltingtemperature of the probe may be calculated using the following formula:for oligonucleotide probes between 14 and 70 nucleotides in length, themelting temperature (Tm) in degrees Celcius may be calculated using theformula: Tm=81.5+16.6(log [Na+])+0.41(fraction G+C)−(600/N), where N isthe length of the oligonucleotide. If the hybridization is carried outin a solution containing formamide, the melting temperature may becalculated using the equation Tm=81.5+16.6(log [Na+])+0.41 (fractionG+C)−(0.63% formamide)−(600/N), where N is the length of the probe. Forprobes over 200 nucleotides in length, the hybridization may be carriedout at 15-25° C. below the Tm. For shorter probes, such asoligonucleotide probes, the hybridization may be conducted at 5-10° C.below the Tm. Preferably, the hybridization is conducted in 6×SSC forshorter probes and the hybridization is conducted in 50% formamidecontaining solutions for longer probes.

[0062] As used herein, the term “homology” refers to the optimalalignment of sequences (either nucleotides or amino acids), which may beconducted by computerized implementations of algorithms. “Homology”,with regard to polynucleotides, for example, may be determined byanalysis with BLASTN version 2.0 using the default parameters.“Homology”, with respect to polypeptides (i.e., amino acids), may bedetermined using a program, such as BLASTP version 2.2.2 with thedefault parameters, which aligns the polypeptides or fragments beingcompared and determines the extent of amino acid identity or similaritybetween them. It will be appreciated that amino acid “homology” includesconservative substitutions, i.e. those that substitute a given aminoacid in a polypeptide by another amino acid of similar characteristics.Typically seen as conservative substitutions are the followingreplacements: replacements of an aliphatic amino acid such as Ala, Val,Leu and lle with another aliphatic amino acid; replacement of a Ser witha Thr or vice versa; replacement of an acidic residue such as Asp or Gluwith another acidic residue; replacement of a residue bearing an amidegroup, such as Asn or Gln, with another residue bearing an amide group;exchange of a basic residue such as Lys or Arg with another basicresidue; and replacement of an aromatic residue such as Phe or Tyr withanother aromatic residue. A “homology of 70% or higher” includes ahomology of, for example, 70%, 75%, 80%, 85%, 90%, 95%, and up to 100%(identical) between two or more nucleotide or amino acid sequences. A“homology of at least 45%” includes a homology of, for example, 45%,50%, 60%, 70%, 80%, 90%, and up to 100% (identical) between two or morenucleotide or amino acid sequences.

[0063] The present invention provides a method for detecting apolypeptide involved in lipopeptide biosynthesis or a polynucleotideencoding such a polypeptide.

[0064] In one embodiment, the method of the present invention providesone or more reference sequences and compares a candidate sequence(either a specific single candidate sequence or a candidate databasesequence) with the one or more reference sequences. The sequencehomology is determined for the sequences compared. A candidate sequencesharing at least 45% homology to one or more reference sequences isconsidered to be a candidate polypeptide or a candidate polynucleotideencoding a candidate polypeptide which is involved in lipopeptidebiosynthesis. Preferably, a candidate polypeptide sequence sharing 45%homology to consensus sequences 1 or 2, is considered as a candidateacyl-specific C-domain polypeptide, a candidate polypeptide sequencesharing 55% homology to consensus sequence 3 is considered a candidateadenylating enzyme, a candidate polypeptide sequence 50% homology toconsensus 4 is considered a candidate acyl-carrier protein. Theinvolvement of these identified sequences in lipopeptide biosynthesismay be confirmed by first expressing the polypeptide from thepolynucleotide candidates and performing the function analysis accordingto methods known in the art and as described herein in Examples 1-2.

[0065] In another embodiment of the invention, the subject methodcompares one or more reference sequences against sequences within acandidate database of a specific organism. This will determine whetherthe specific organism may contain a polypeptide involved in lipopeptidebiosynthesis or a polynucleotide encoding such a polypeptide. If it isdetermined that a specific organism may contain such a polynucleotidesequence encoding a polypeptide for lipopeptide biosynthesis, proteinsfrom the candidate database (e.g., a part of the whole genome sequence)may be expressed and analyzed according to methods known in the art andas described herein in Examples 1-2.

[0066] a preferred embodiment, the reference sequences used in thesubject method are selected from the group consisting of polynucleotideor polypeptide sequences representing: an acyl-specific C-domain, anADLE, an ACPH, and an ADLF in one or more of the biosynthetic lociselected from the group consisting of RAMO, DAPT, A541, 009H, 024A,023C, A410, 070B and CADA.

[0067] In another preferred embodiment, the reference sequences mayfurther include one or more reference polypeptides having at least 45%sequence homology to SEQ ID NO: 1 or SEQ ID NO: 2, one or more referencepolypeptides having at least 55% sequence homology to SEQ ID NO: 3, oneor more reference polypeptides having at least 50% sequence homology toSEQ ID NO: 4, or one or more reference polynucleotides encoding suchpolypeptide sequences.

[0068] Also within the scope of the present invention are a memorysystem for storing data that can be accessed by a computer, a computerreadable medium comprising a computer program and data for sequencecomparison, and a computer system for performing sequence comparison ofthe present invention.

[0069] The computer system of the present invention will provide one ormore reference polynucleotide or polypeptide sequences selected from thegroup consisting of polynucleotide or polypeptide sequences representingan acyl-specific C-domain, an adenylating enzyme (ADLE) or an acylcarrier protein ACPH or a fusion of the two (ADLF) in one or more of thebiosynthetic loci selected from RAMO, DAPT, A541, 009H, 024A, 023C,A410, 070B and CADA.

[0070] Additionally or alternatively, the computer system of the presentinvention will provide one or more reference polypeptides comprising theconsensus sequences of the present invention, i.e. one or more referencepolypeptides having at least 45% sequence homology to SEQ ID NO: 1 orSEQ ID NO: 2, one or more reference polypeptides having at least 55%sequence homology to SEQ ID NO: 3, one or more reference polypeptideshaving at least 50% sequence homology to SEQ ID NO: 4, or one or morereference polynucleotides encoding such polypeptide sequences.

[0071] The computer system of the invention may also provide candidatepolynucleotide or polypeptide sequence(s). The candidate polynucleotideor polypeptide may exist as a specific single sequence or it may be acandidate database, e.g., a part of the entire genome sequence of anorganism, or protein family sequences.

[0072] The computer system of the invention will perform sequencecomparison between one or more candidate sequences and one or morereference sequences. The computer system will also determine the levelof homology of two or more sequences compared and identify a candidatesequence which shares at least 45% homology with a SEQ ID NO: 1 or SEQID NO: 2, and in some embodiments additionally identify a candidatesequence which shares at least 55% homology with SEQ ID NO: 3 or acandidate sequence which shares at least 50% homology with SEQ ID NO: 4.

[0073] The memory and computer system of the present invention permitsthe quick development of methods to search candidate databases andindividual candidate sequences for their sequence homology against oneor more reference sequences. In addition, the memory and computer systemof the present invention will also permit the prediction of proteinsequences from polynucleotide sequences, the prediction of homologousprotein domains between two or more polypeptides, and the analysis ofstructure and function from sequence data.

[0074] The computer may be programmed to implement a process foreffecting the identification, analyses, or modeling of a sequence of apolypeptide or a polynucleotide. In one embodiment the memory of thepresent invention contains data representing a polypeptide with 70%sequence homology to any one sequence selected from the group consistingof: SEQ ID NOs. 1, 2, 3, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26,28, 30, 32, 34, 36, 38, 40, 42, 44, 46, and 48. The preferred process bywhich data for a source database according to the present invention maybe obtained is illustrated in FIGS. 12 and 13.

[0075] One use of the memory and computer system involves studying anorganism's genome (e.g., database candidate sequences) to determine thesequence homology between the polynucleotide/polypeptide sequences inthe genome and one or more reference polynucleotide or polypeptidesequences. Such information is of significant interest in assessingwhether an organism contains a lipopeptide biosynthesis locus or anypolynucleotide/polypeptide involved in lipopeptide biosynthesis.

[0076] Another use of the memory and computer system involves studyingone or more specific candidate sequences to determine the sequencehomology between the specific candidate polynucleotide/polypeptidesequences and one or more reference polynucleotide or polypeptidesequences. Such information helps to determine whether the specificcandidate sequence is involved in lipopeptide biosynthesis.

[0077] Where a specific polynucleotide candidate sequence orpolynucleotide database candidate sequences are being analyzed, thememory and computer system may permit the prediction of an Open ReadingFrame (ORF) from a candidate sequence. The ORF corresponds to anucleotide sequence which could potentially be translated into apolypeptide. Such a stretch of sequence is uninterrupted by a stopcodon. An ORF that represents the coding sequence for a full proteingenerally begins with an ATG “start” codon and terminates with one ofthe three “stop” codons. For the purposes of this application, an ORFmay be any part of a coding sequence, with or without start and/or stopcodons. For an ORF to be considered as a good candidate for coding for abona fide cellular protein, a minimum size requirement is often set, forexample, a stretch of DNA that would code for a protein of 50 aminoacids or more.

[0078] To make the above sequence information manipulation easy toperform and understand, sophisticated computer database systems may beused. In one embodiment, the reference sequences are electronicallyrecorded and annotated with information available from public sequencedatabases. Examples of such databases include GenBank (NCBI) and theComprehensive Microbial Resource database (The Institute for GenomicResearch). The resulting information is stored in a relational databasethat may be employed to determine homologies between the referencesequences and genes within and among genomes.

[0079] To identify homologies between the sequences, one or moresequence alignment algorithms such as BLAST (Basic Local AlignmentSearch Tool) or FAST (using the Smith-Waterman algorithm) may beemployed. In a particularly preferred embodiment, these two alignmentprotocols are used in combination. Both of these algorithms look forregions of similarity between two sequences; the Smith-Watermanalgorithm is generally more tolerant of gaps, and is used to provide ahigher resolution match after the BLAST search provides a preliminarymatch. These algorithms determine (1) alignment between similar regionsof the two sequences, and (2) a percent identity between sequences. Forexample, alignment may be calculated by matching, base-by-base or aminoacid-by-amino acid, the regions of substantial similarity.

[0080]FIG. 11 is a block diagram of a computer system according to oneembodiment of the invention. The system shown in FIG. 11 for performingthe sequence comparison processing of the invention may be a generalpurpose computer used alone or in connection with a specializedprocessing computer. Such processing may be performed by a singleplatform or by a distributed processing platform. In addition, suchprocessing and functionality can be implemented in the form of specialpurpose hardware or in the form of software being run by a generalpurpose computer. Any data handled in such processing or created as aresult of such processing can be stored in a temporary memory, such asin the RAM of a given computer system or subsystem. In addition, or inthe alternative, such data may be stored in longer-term storage devices,for example, magnetic disks, rewritable optical disks and so on. Forpurposes of the disclosure herein, computer-readable media may compriseany form of data storage mechanism, including such existing memorytechnologies as well as hardware or circuit representations of suchstructures and of such data.

[0081] The computer system 40 (FIG. 11) may include an operating system(e.g., UNIX) on which runs a relational database management system, aWorld Wide Web application, and a World Wide Web server. The software onthe computer system may assume numerous configurations. For example, itmay be provided on a single machine or distributed over multiplemachines.

[0082] World Wide Web application includes the executable code necessaryfor generation of database language statements [e.g., Standard QueryLanguage (SQL) statements]. Generally, the executables will includeembedded SQL statements. In addition, the World Wide Web application mayinclude a configuration file which contains pointers and addresses tothe various software entities that comprise the server as well as thevarious external and internal databases which must be accessed toservice user requests. The Configuration file also directs requests forserver resources to the appropriate hardware—as may be necessary shouldthe server be distributed over two or more separate computers.

[0083] A World Wide Web browser may be used for providing a userinterface 10 (FIG. 11). Through the Web browser, a user may constructsearch requests for retrieving data from a sequence database and/or agenomic database. Thus, the user will typically point and click to userinterface elements such as buttons, pull down menus, scroll bars, etc.conventionally employed in graphical user interfaces. The requests soformulated with the user's Web browser are transmitted to a Webapplication which formats them to produce a query that can be employedto extract the pertinent information from sequence databases or genomicdatabases.

[0084] When network 40 employs a World Wide Web server, it supports aTCP/IP protocol. Local networks such as this are sometimes referred toas “Intranets.” An advantage of such Intranets is that they allow easycommunication with public domain databases residing on the World WideWeb (e.g., the GenBank World Wide Web site). Thus, in a particularpreferred embodiment of the present invention, users can directly accessdata (via Hypertext links for example) residing on Internet databasesusing a HTML interface provided by Web browsers and Web servers.

EXAMPLE 1

[0085] Conserved Genes and Proteins Involved in N-acylation inLipopeptides

[0086] The acyl-specific C-domains and ADLE, ADLF and ACPH proteinfamilies of the invention were discovered by identifying, characterizingand comparing several full-length biosynthetic loci, each producing alipopeptide of known structure and each residing in a microorganismreported to produce the lipopeptide of known structure.

[0087] RAMO: Ramoplanin is a lipopeptide produced by Actinoplanes sp.ATCC 33076 (see US Pat. No. 4,303,646). Ramoplanin is a glycosylatedlipodepsipeptide of known structure (see, for example, U.S. Pat. No.4,427,656). The full-length biosynthetic locus for ramoplanin fromActinoplanes sp. (RAMO) was cloned and sequenced (FIG.. 1 a). The openreading frames in RAMO were identified and a function was attributed toeach protein encoded by the open reading frames. RAMO is described indetail in co-pending U.S. application Ser. No. 09/976,059 USSN and inPCT international application PCT/CA01/01462, published as WO 02/31155.

[0088] DAPT: A21978C is a lipopeptide produced by Streptomycesroseosporus. The structure of A21978C is known. While some progress hasbeen reported towards elucidation of the biosynthetic locus responsiblefor the production of A21978C in Streptomyces roseosporus (DAPT), thefull locus was not known. Transposon mutagenesis techniques had beenperformed to locate DAPT [McHenney et al. (1998) J. Bact. Vol. 180 pp.143-151] and DNA fragments derived therefrom had been used forinsertional mutagenesis experiments that demonstrated inactivation ofA21978C production. Analysis of the DNA sequence of the fragmentsrevealed the presence of NRPS genes involved in the biosynthesisA21978C. This genetic and biological data demonstrated beyond doubt thatthe identified pathway was indeed responsible for A21978C expression.However, the full biosynthetic locus for A21978C had not been reported.

[0089] The method used to clone DAPT, a partial locus formed of sevencomplete and one partial open reading frames (ORFs) (FIG.. 1 a), isdisclosed in U.S. Ser. No. 60/342,133. Actinomycetes generally producelipopeptides using NRPS proteins and a number of the ORFs discoveredcorresponded to NRPS proteins. Moreover, one of the NRPS ORFs discoveredcontained the partial NRPS sequences previously demonstrated to be partof the A21978C locus, thereby confirming the identify of DAPT. Themodule and domain organization analysis of ORFs designated 7 to 9 inU.S. Ser. No. 60/342,133 is consistent with that expected forbiosynthesis of A21978C as described in detail in U.S. Ser. No.60/342,133. The nature and order of the amino acid residues specified byORFs 7 to 9 coincide with the exact chemical structure of A21978C (seeTable 3 and FIG. 1 of U.S. Ser. No. 60/342,133). This analysis, asdescribed in detail in USSN 60/342,133 demonstrate beyond doubt thatDAPT is indeed the biosynthetic locus for A21978C from S. roseosporus.

[0090] A541: Streptomyces fradiae strain NRRL 18158 was known to producethe lipopeptide antibiotic complex A54145 of known structure. Howeverthe biosynthetic locus for A54145 in Streptomyces fradiae (A541) was notknown. We cloned, sequenced and annotated A541, as disclosed in detailin U.S. Ser. No. 60/342,133, U.S. Ser. No. 60/372,789 and in co-pendingU.S. Ser. No. 10/XXX,XXX filed concurrently with the present applicationand also claiming priority from U.S. Ser. No. 60/342,133 and U.S. Ser.No. 60/372,789. The contents of U.S. Ser. No. 10/XXX,XXX areincorporated herein in its entirety for all purposes.

[0091] A541 contains three complete and one partial NRPS genes (FIG.1b). Analysis of the NRPS ORFs revealed the presence of conserveddomains involved in the recognition, activation, modification andcondensation of amino acids. A total of 13 modules responsible for thecondensation of 13 amino acid residues were identified as expected giventhat A54145 is composed of 13 amino acids. The adenylation domains wereexamined in order to determine the specificity of the amino acids thatthey activate and tether to the cognate thiolation domain of the NRPS.The nature and order of the amino acid residues specified by the NRPSORFs exactly correspond to the nature and order of the amino acidresidues found in the A54145 chemical structure (see Table 4 and FIG. 2of U.S. Ser. No. 60/372,789). A methylation domain of ORF 8, module 5 asdisclosed in U.S. Ser. No. 60/372,789 specifying the amino acid glycinecorresponds to the amino acid incorporated in the fifth position ofA54145 which is a N-methylated glycine (sarcosine). The nature and orderof the amino acids specified by the NRPS genes as well as the presenceof domains involved in the modification of some of the amino acidsconfirm that A541 is indeed the biosynthetic locus for A54145 in S.fradiae.

[0092] RAMO, DAPT and A541 were analyzed and compared. All three locicontain NRPS loading modules that begin with a condensation domaininstead of the conventional adenylation-thiolation domains (FIGS. 1a andb, SEQ ID NOS: 6, 8 and 10 respectively). Such modules would generallybe considered not to be capable of initiating peptide assembly on theassumption that the C-domain would likely interfere with this initiationprocess (see, for example, Linne and Marahiel, 2000, Biochemistry, Vol.39, pp. 10439-10447). The nucleotide sequences of the members of theconserved family of unusual NRPS C-domains in RAMO, DAPT and A541 aredisclosed as SEQ ID NOS: 5, 7 and 9 respectively. The polypeptidescoding for the members of the conserved family of unusual NRPS C-domainsin RAMO, DAPT and A541 are disclosed as SEQ ID NOS: 6, 8 and 10respectively.

[0093] These C-domains were assessed by computer comparison withproteins found in the GenBank database of protein sequences (NationalCenter for Biotechnology Information, National Library of Medicine,Bethesda, Md., USA) using the BLASTP algorithm (Altschul et al., supra)and the results are presented in Table 1. Amino acid sequence comparisonanalysis indicates that the RAMO, DAPT and A541 C-domains are related tocondensation domains found in other lipopeptide-encoding NRPS systems.

[0094] The RAMO, DAPT and A541 C-domains were also compared to acollection of condensation domains from various lipopeptide NRPSsobtained from GenBank or disclosed herein. FIG. 2 shows the evolutionaryrelatedness of these C-domains. Apart from RAMO, DAPT, A541, FIG. 2refers to additional lipopeptide biosynthetic loci by way of a fourletter designations wherein CADA is the biosynthetic locus for thecalcium-dependent antibiotic, FENG is the biosynthetic locus forfengycin, SURF is the biosynthetic locus for surfactin, SYRI is thebiosynthetic locus for syringomycin, SERR is the biosynthetic locus forserrawettin, LICH is the biosynthetic locus for lichenysin, ITUR is thebiosynthetic locus for iturin, and MYSU is the biosynthetic locus formycosubtilin. All C-domains included in this analysis are full-length Cdomains. The convention used to identify and distinguish C domains inFIG. 2 is as follows. Those NRPS C-domain sequences that were obtainedfrom the GenBank database are denoted by accessions beginning with threeletters and are followed by digits (usually numbering 5). These firsteight characters identifying each of the C domains correspond to theGenBank accession number. The lower case “n” serves to denote “NRPSdomain”, and the “CD” followed by two digits denotes “C domain” and itsnumber relative to the other C domains contained on that polypeptidesequence. For example “AAC80285nCD06|SYRI” represents the amino acidsequence corresponding to the sixth C domain contained on the GenBankentry AAC80285 for an NRPS from the syringomycin biosynthetic locus. TheNRPS C domain sequences that are disclosed for the first time in thisapplication, in U.S. provisional patent application Ser. No. 60/342,133USSN or U.S. patent application Ser. No. 09/976,059 USSN follow asimilar nomenclature (nCD00) but are denoted by nine-characteraccessions beginning with three numbers.

[0095] Analysis of a clustal alignment of the C-domains clearly showsthat these domains are evolutionarily related to C-domains found in thestarter modules of known N-acylated lipopeptides such ascalcium-dependent antibiotic (CADA) (FIG. 1e, domain 22), surfactin(SURF), syringomycin (SYRI) and mycosubtilin (MYCO) among others (FIG.2). Moreover, these special C-domains are significantly evolutionarilydistant from regular condensation domains found in NRPSs that catalyzeamide bond formation and condensation between two adjacent amino acids(FIG. 2). Alignment of these unusual C-domains demonstrates theconservation of motifs and specific amino acid residues important fortheir catalytic activity (FIG. 3). Based on these observations, theunusual C-domains are considered to catalyze N-acyl peptide linkagesbetween a fatty acid and the amino terminal group of an amino acid.

[0096] A conserved family of activating enzymes (ADLE) was also found tobe common to RAMO, DAPT and A541, although the gene encoding theactivating enzyme in A541 was fused together with the gene encoding anacyl-carrier protein to form a single ORF (ADLF). The nucleotidesequences of the members of the conserved family of activating enzymesin RAMO, DAPT and A541 are disclosed as SEQ ID NOS: 23, 25 and 35respectively. The polypeptides coding for these activating enzymes aredisclosed as SEQ ID NOS: 24, 26 and 36 respectively. The ADLE activatingenzyme portion of the ADLF fusion protein is referred to as SEQ ID NO:36★.

[0097] A conserved family of acyl carrier proteins (ACPH) was also foundto be common to RAMO, DAPT and A541, although the gene encoding the acylcarrier protein in A541 was fused together with the gene encoding theactivating enzyme to form a single ORF (ADLF). The nucleotide sequencesof the members of the conserved family of acyl carrier proteins in RAMO,DAPT and A541 are disclosed as SEQ ID NOS: 37, 39 and 35 respectively.The polypeptides coding for these acyl carrier proteins are disclosed asSEQ ID NOS: 38, 40 and 36 respectively. The ACPH acyl carrier portion ofthe ADLF fusion protein is referred to as SEQ ID NO: 36★★.

[0098] The biological function of the ADLE, ADLF and ACPH ORFs wasassessed by amino acid sequence similarity analysis. The ADLE family ofproteins shows similarity to various acyl CoA ligase enzymes whereas theACPH family of proteins has sequence similarities to acyl carrierproteins found in the acyl-condensing polyketide synthase enzymaticsystems (Tables 2 and 3). Clustal alignment of ADLE ORFs shows theconservation of domains and residues important for their enzymaticfunction (FIG. 4). Alignment of ACPH ORFs shows their overall sequenceconservation and the absolute conservation of the serine residue that ismodified by phosphopantetheinylation to form the active holo-acylcarrier protein (FIG. 5). Both ADLE and ACPH protein families areevolutionarily closely related to corresponding protein families fromother lipopeptide loci (FIG. 6).

[0099] The ADLE and ACPH proteins as well as the acyl-specific C-domainsof the invention are widely conserved throughout the biosynthetic lociof structurally diverse lipopeptides, including glycosylatedlipopeptides and acidic glycopeptides. The only structural featurecommon to ramoplanin, A21978C and A54145 is a peptide backbone appendedwith a fatty acyl group at the N-terminal amino acid residue. Based onthese correlations, the ADLE and ACPH proteins, and the unusual C-domainare considered to be responsible for activating and tethering fatty acylgroups and catalyzing the formation of the N-acyl peptide linkage.

EXAMPLE 2

[0100] Biosynthesis of N-acylated Peptides:

[0101] Despite the significant overall evolutionary distance between thelipopeptide-producing microorganisms described in this invention, theyall contain closely related C-domains that are used for peptideN-acylation, a step which doubles as the peptide chain initiation step.Without intending to be limited to any particular biosynthetic scheme ormechanism of action, the ADLE, ACPH and unusual NRPS C-domain of thepresent invention can explain formation of the N-acyl peptide linkagefound in lipopeptides. FIG. 7 illustrates a mechanism for NRPS chaininitiation in which the fatty acyl group primes the synthesis of thepeptide by the NRPS. CoA-linked fatty acyl precursors are channeled fromthe primary metabolic pool and modified while still attached to CoA byaccessory enzymes such as oxidoreductases, epoxidases, desaturases, etc.encoded by genes of primary metabolism or by genes within thebiosynthetic locus. The mature fatty acyl-CoA intermediate is thenrecognized by the cognate adenylating enzyme and transferred onto thephosphopantetheinyl prosthetic arm of the free holo-ACP, releasingCoA-SH and utilizing ATP in the process. It is alternativelycontemplated that the adenylating enzyme may recognize free fatty acylsubstrate(s) and transfer them onto the phosphopantetheinyl prostheticarm of the free holo-ACP, utilizing ATP in the process. Once the fattyacyl group is tethered onto the free holo-ACP, the C domain of the firstmodule carries out a reaction in which the carbonyl group of theactivated fatty acyl is condensed with the amino group of the amino acidsubstrate that had been previously activated and tethered by the firstmodule of the NRPS. Hence, peptide chain initiation and N-acylation areclosely coupled. Subsequent peptide elongation and termination steps canthen proceed as with typical NRPS modules.

[0102]FIG. 8 illustrates the above-described amino acid N-acylationmechanism using specific examples in known lipopeptide biosyntheticpathways. In ramoplanin biosynthesis, an ADLE enzyme activates specificfatty acid moieties and subsequently tethers them onto thephosphopantetheinyl prosthetic arm of the ACPH (disclosed herein as SEQID NOS: 24 and 38 respectively). The carbonyl group of the activatedfatty acyl is then condensed to the amino group of the asparagineresidue (Asn) that had been previously activated by and tethered to thefirst module of the NRPS. The condensation reaction is catalyzed by theacyl-specific C-domain, disclosed herein as SEQ ID NO: 6, of the firstmodule of the NRPS (FIGS. 1a and 8).

[0103] In another example, biosynthesis of the acylated peptide chain ofantibiotic A54145 is initiated by activation and tethering of specificfatty acid units onto the ACPH component of the ADLF protein disclosedherein as SEQ ID NO: 36. ADLF represents the fusion of the two proteinfamilies, ADLE and ACPH, required for activation of fatty acids inlipopeptide biosynthesis. Once the fatty acid is activated, theacyl-specific C-domain of the first module, disclosed herein as SEQ IDNO 10, catalyzes the condensation of the carbonyl group of the fattyacyl and the amino group of the tryptophan residue (Trp) that had beenpreviously activated by and tethered to the first module of the NRPS(FIGS. 1b and 8).

[0104] The same mechanism for peptide N-acylation may be present inother microorganisms. Evidence supporting this hypothesis includes thefact that other lipopeptide NRPS enzymes that have been identified invery diverse microorganisms contain a specialized C domain in the firstmodule. Examples include the syringomycin biosynthetic locus fromPseudomonas syringae pv. syringae (Guenzi at al. (1998) J. Biol. Chem.Vol.273, pp. 32857-32863); the serrawettin W2 biosynthetic locus fromSerratia liquefasciens MG1 (Lindum et al. (1998) Vol 180, pp.6384-6388); the fengycin biosynthetic loci from Bacillus subtilis b213and A1/3 (Steller et al. (1999) Chem. Biol. Vol. 6, pp. 31-41); thesurfactin biosynthetic locus from Bacillus swotilis; the lichenysinbiosynthetic locus from Bacillus licheniformis (Konz et al. (1999) J.Bact. Vol. 181, pp. 133-140); and the “calcium-dependent antibiotic”(CADA) biosynthetic locus from Streptomyces coelicolor A3(2) (Hajati etal. (2002) Chem. Biol. Vol. 9, pp. 1175-1187). The CADA biosyntheticlocus does not apparently have an adenylating enzyme homologue but itdoes contain a free acyl carrier protein that may participate togetherwith the unusual C domain of the first NRPS module in the N-acylationmechanism. Therefore, certain fatty acids may require specializedenzymes to transfer the fatty acyl moiety onto the acyl carrier protein,but once tethered onto the free acyl carrier protein the mechanism isanalogous to that outlined in FIG. 7. It is noteworthy to point out thatthe fatty acyl moiety of CDA is unique in that it contains an epoxymodification. Hence such fatty acids may be transferred onto the ACP bysome other specialized enzyme.

[0105] It is possible that the N-acylation mechanism of the presentinvention extends beyond bacteria to even more diverse microorganismssuch as lower eukaryotes and other organisms. For example, the fungiAspergillus nidulans var. roseus, Glarea lozoyensis, and Aspergillusjaponicus var. aculeatus are known to produce the antifungallipopeptides echinocandin B, pneumocandin B0, and aculeacin A,respectively (Hino et al. (2001) Journal of Industrial Microbiology andBiotechnology Vol 27, pp. 157-162). Based on the overall similaritybetween fungal and bacterial NRPS systems and on the fact that we haveshown that very diverse NRPS systems employ the same mechanism ofN-acylation, the mechanism of peptide N-acylation described in thisinvention is likely to be operative in these and/or otherlipopeptide-producing lower eukaryotes as well.

[0106] Although the disclosed mechanism for peptide N-acylation isapparently widespread among very diverse microorganisms, it is not theonly means by which lipopeptides can be generated. For example, thelipopeptides mycosubtilin and iturin A produced by Bacillus subtilisATCC and RB14, respectively, are each assembled by multifunctionalhybrid polypeptides comprising fused fatty acid synthase, aminotransferase, and NRPS activities (Duitman et al. (1999) Proc. Natl.Acad. Sci USA. Vol. 96, pp. 13294-13299; Tsuge et al. (2001) J. Bact,Vol. 183, pp. 6265-6273). This alternative mechanism of peptideN-acylation may be more evolutionarily restricted as, to the best of ourknowledge, it has been identified only in members of the genus Bacillus,and the lipopeptides produced by these biosynthetic loci are members ofa distinct sub-group of lipopeptides that contain a P-amino fatty acylmoiety linked to the amino terminus of the peptide core. Despite thefact that this mechanism of N-acylation does not involve the action ofADLE and ACPH homologues, the C-domains that condense the β-amino fattyacyl moiety to the first amino acid of both mycosubtilin and iturin arefound to cluster within the highlighted group of acyl-specific C-domainsas shown in FIG. 2.

[0107] The widespread N-acylation mechanism for peptide natural productsprovides a knowledge-based approach for discovery and identification oflipopeptide biosynthetic loci in microorganisms. The highly conservednucleotide sequences that are distinguishing signatures of theadenylating enzyme, the acyl carrier protein, and/or the specializedC-domain involved in the N-acylation mechanism can be identified andutilized as probes to screen libraries of microbial genomic DNA for thepurpose of rapidly identifying, isolating, and characterizinglipopeptide biosynthetic loci in microorganisms of interest. Thesequences of ADLE, ACPH proteins and the acyl-specific C-domain can alsobe used for in silico screening of large collections of microorganisms.Such a genetic-based screen has the added advantage over traditionalfermentation approaches in that organisms having the genetic potentialto produce lipopeptide natural products can be identified without thelaborious fermentation, isolation, and characterization of thelipopeptide natural product. In addition, those organisms that normallyproduce lipopeptides only at very low or undetectable amounts or thoseorganisms that only produce lipopeptides under very specialized growthconditions can nevertheless be readily identified using this geneticapproach.

EXAMPLE 3

[0108] Identification of Putative Lipopeptide Biosynthetic Locus 009H:

[0109] The sequences of the ADLE, ACPH and the acyl-specific C-domainwere used in silico to screen a proprietary database of bacterialsecondary metabolism loci, DECIPHER® (Ecopia BioSciences Inc; CA2,352,451). To facilitate sequence comparisons, a protein domaindatabase was generated that is part of the DECIPHER® database andcomprises domains from multimodular proteins such as NRPSs andpolyketide synthases, as well as equivalent domains found in non-modularproteins.

[0110] Protein sequences from loci RAMO, DAPT and A541 corresponding toacyl-specific C-domains, disclosed as SEQ ID NOS: 6, 8 and 10respectively, ADLE ORFs, disclosed as SEQ ID NOS: 24, 26 and 36★, andACPH ORFs, disclosed as SEQ ID NOS: 38, 40 and 36★★, were compared tothe DECIPHER® domain database using the BLASTP algorithm (Altschul etal., supra). Moreover, consensus sequences from the acyl-specificC-domain, the ADLE and ACPH proteins, generated using the HMMER softwarepackage as described herein and disclosed as SEQ ID NOS: 1, 2, 3 and 4,were also compared to the DECIPHER® domain database.

[0111] Determination of sequence homology is assisted by the E valuethat indicates whether two sequences display sufficient similarity tojustify an inference of homology. An E value of 0.00 indicates a perfecthomolog. The E values are calculated as described in Altschul et al.1990, J. Mol. Biol. 215(3): 403-410; in Altschul et al.1993, NatureGenetics 3: 226-272.

[0112] Comparison analysis of acyl-specific C-domain sequences withsequences from over 450 loci in the DECIPHER® database revealed thepresence of a condensation domain, disclosed herein as SEQ ID NO: 12,that is included in locus 009H found in Streptomyces ghanaensis (NRRLB-12104). Table 4 shows that SEQ ID NO: 12 shows higher sequencesimilarity with sequences from the acyl-specific C-domains of RAMO, DAPTand A541 (that condense an acyl group to the amino terminal group of anamino acid) than with a typical NRPS condensation domain that catalyzesjoining of two amino acids, as exemplified by the C-domain of the firstmodule found in the ramoplanin ORF13 as described in detail inPCT/CA01/01462. TABLE 4 SEQ Target Domain ID NO Probing Domain E valueSEQ ID NO  1 Consensus C1 4 e−54 12  2 Consensus C2  1 e−115 12  6 RAMOC-domain 4 e−44 12  8 DAPT C-domain 4 e−75 12 10 A541 C-domain 1 e−65 12— RAMO ORF13, C-domain 4 e−11 12  3 Consensus ADLE 0.00 28 24 RAMO ADLE 1 e−118 28 26 DAPT ADLE  1 e−141 28  36* A541 ADLE  1 e−141 28  4Consensus ACPH 2 e−22 42 38 RAMO ACPH 4 e−15 42 40 DAPT ACPH 6 e−15 42 36** A541 ACPH 3 e−07 42

[0113] Similarly, ADLE domains with SEQ ID NOS: 3, 24, 26 and 36★ aswell as ACPH domains with SEQ ID NOS: 4, 38, 40 and 36★★ were comparedto the DECIPHER® domain database. Comparison analysis indicated thepresence of proteins with high sequence homology to ADLE and ACPHsequences, disclosed as SEQ ID NOS: 28 and 42 respectively, also foundin the 009H locus. The relatedness of SEQ ID NOS: 12, 28 and 42 toacyl-specific C-domains, ADLE and ACPH proteins was further confirmed byclustal sequence alignment showing the conservation of specific proteindomains and by phylogenetic analysis (FIGS. 3-6).

[0114] Closer inspection of locus 009H shows the presence of 4 NRPS ORFscomposed of 13 modules (FIG. 1b). The first NRPS ORF begins with theacyl-specific C-domain (SEQ ID NO: 12) instead of a typical adenylationdomain. The ADLE and ACPH proteins (SEQ ID NOS: 28 and 42, respectively)are found in close proximity to the NRPS carrying the acyl-specificC-domain indicating that all three enzymes are part of the samebiosynthetic locus. The simultaneous presence of these three enzymesalong with the N-terminal location of the acyl-specific C-domain and thepresence of a multienzymatic NRPS complex is consistent with thebiosynthesis of an N-acylated lipopeptide, specified by locus 009H.

EXAMPLE 4

[0115] Identification of Putative Lipopeptide Biosynthetic Locus 023C

[0116] In silico screening of the DECIPHER® database with consensusprotein sequences and with sequences from loci RAMO, DAPT and A541corresponding to acyl-specific C-domains, disclosed as SEQ ID NOS: 1, 2,6, 8 and 10 respectively, further revealed the presence of anacyl-specific C-domain in locus 023C present in Streptomyces aizunensisNRRL B-11277. As shown in Table 5, sequence comparison analysisdemonstrates that the 023C acyl-specific C-domain, disclosed herein asSEQ ID No: 16, is more closely related to the N-acyl capping C-domainsfrom RAMO, DAPT and A541 than to typical NRPS condensation domainsrepresented by the C-domain of the first module found in the ramoplaninORF13 as described in detail in PCT/CA01/01462. TABLE 5 SEQ TargetDomain ID NO Probing Domain E value SEQ ID NO  1 Consensus C1  1 e−15216  2 Consensus C2 4 e−53 16  6 RAMO C-domain 6 e−82 16  8 DAPT C-domain8 e−45 16 10 A541 C-domain 1 e−32 16 — RAMO ORF13, C-domain 3 e−09 16  3Consensus ADLE 0.00 32 24 RAMO ADLE  1 e−126 32 26 DAPT ADLE  1 e−146 32 36* A541 ADLE  1 e−134 32  4 Consensus ACPH 1 e−29 46 38 RAMO ACPH 2e−16 46 40 DAPT ACPH 9 e−16 46  36** A541 ACPH 3 e−07 46

[0117] Proteins related to the ADLE and ACPH families of proteins,disclosed herein as SEQ ID 32 and 46, were also found in locus 023C(Table 5). The relatedness of SEQ ID NOS: 16, 32 and 46 to acyl-specificC-domains, ADLE and ACPH proteins was further confirmed by clustalalignment showing the conservation of specific protein domains and aminoacid residues important for catalytic activity (FIGS. 3-5) and byphylogenetic analysis (FIG. 6).

[0118] Analysis of locus 023C shows the presence of 6 NRPS ORFs composedof 28 modules (FIG. 1c). The first NRPS ORF begins with theacyl-specific C-domain (SEQ ID NO: 16) indicative of the N-acyl cappingmechanism (FIG. 7). Moreover, ADLE and ACPH proteins involved in fattyacid activation and tethering (SEQ ID NOS: 32 and 46 respectively) arealso found in the 023 locus near the NRPS ORF, demonstrating that locus023C is likely to encode an N-acylated lipopeptide metabolite.

EXAMPLE 5

[0119] Identification of Putative Lipopeptide Biosynthetic Locus 024A:

[0120] Screening of the DECIPHER® database through protein homologyanalysis with sequences corresponding to acyl-specific C-domains (SEQ IDNOS: 1, 2, 6, 8 and 10) revealed the presence of an acyl-specificC-domain in locus 024A found in Streptomyces refuineus NRRL 3143. Asshown in Table 6, BLASTP analysis demonstrates that the 024C encodedC-domain (SEQ ID NO: 14) is more closely related to domains condensingacyl groups to amino acids than to domains condensing two amino acids,as exemplified by the C-domain of the first module found in theramoplanin ORF13 as described in detail in PCT/CA01/01462. TABLE 6 SEQTarget Domain ID NO Probing Domain E value SEQ ID NO  1 Consensus C1 2e−50 14  2 Consensus C2  1 e−150 14  6 RAMO C-domain 3 e−43 14  8 DAPTC-domain 9 e−83 14 10 A541 C-domain 0.00 14 — RAMO ORF13, C-domain 4e−16 14  3 Consensus ADLE 0.00 30 24 RAMO ADLE  1 e−112 30 26 DAPT ADLE 1 e−155 30  36* A541 ADLE 0.00 30  4 Consensus ACPH 6 e−23 44 38 RAMOACPH 4 e−12 44 40 DAPT ACPH 2 e−12 44  36** A541 ACPH 5 e−32 44

[0121] ADLE and ACPH related proteins, disclosed herein as SEQ ID NOS:30 and 44, were also found in locus 024A (Table 6). Sequence alignmentsof all three proteins (SEQ ID NOS: 14, 30 and 44) show conservation ofdomains and amino acid residues important for catalytic activity of thecorresponding enzymes (FIGS. 3-5). Additionally, these proteins areevolutionarily related to members of the acyl-specific C-domains, ADLEand ACPH families of proteins as indicated by phylogenetic analysis(FIG. 6).

[0122] Analysis of the 024A complete locus (FIG. 1c and U.S. Ser. No.60/342,133, U.S. Ser. No. 30/372,789 and co-pending U.S. Ser. No.10/XXX,XXX) reveals the presence of 4 NRPS ORFs composed of 13 modules.Consistent with an N-acyl peptide capping mechanism, the acyl-specificC-domain (SEQ ID NO: 14) is located at the N-terminal position of thefirst NRPS ORF, Moreover, the ADLE and ACPH ORFs (SEQ ID NOS: 30 and 44respectively) are immediately adjacent to the acyl-specific C-domainsuggesting a functional interaction between the three proteins. Based onthese observations, locus 024A was predicted and subsequently proven todirect the biosynthesis of an N-acylated lipopeptide (see Example 8).

EXAMPLE 6

[0123] Identification of Lipopeptide 41,012 Biosynthetic Locus A410:

[0124] Protein homology comparison of sequences specifying acyl-specificC-domains (SEQ ID NOS: 1, 2, 6, 8 and 10) with sequences found in theDECIPHER® database revealed the presence of a related C-domain,disclosed herein as SEQ ID 18, in locus A410 found in Actinoplanesnipponensis Routien ATCC 31145. This microorganism has been shown tosynthesize an acidic polypeptide antibiotic of undetermined chemicalstrutcure, compound 41,012, that belongs to the amphomycin group ofN-acylated lipopeptides (U.S. Pat. No. 4,001,397). As shown in Table 7,BLASTP demonstrates that the A410 encoded C-domain (SEQ ID NO: 18) ismore closely related to domains condensing acyl groups to amino acidsthan to domains condensing two amino acids, as exemplified by theC-domain of the first module found in the ramoplanin ORF13 as describedin detail in PCT/CA01/01462. TABLE 7 SEQ Target Domain ID NO ProbingDomain E value SEQ ID NO  1 Consensus C1 9 e−70 18  2 Consensus C2  1e−121 18  6 RAMO C-domain 3 e−64 18  8 DAPT C-domain 5 e−75 18 10 A541C-domain 3 e−62 18 — RAMO ORF13, C-domain 1 e−13 18  3 Consensus ADLE0.00 34 24 RAMO ADLE  1 e−111 34 26 DAPT ADLE  1 e−137 34  36* A541 ADLE 1 e−141 34  4 Consensus ACPH 4 e−31 48 38 RAMO ACPH 1 e−14 48 40 DAPTACPH 5 e−14 48  36** A541 ACPH 2 e−10 48

[0125] ADLE and ACPH related proteins, disclosed herein as SEQ ID NOS:34 and 48, were also found in locus A410 (Table 7). Sequence alignmentsof all three proteins (SEQ ID NOS: 18, 34 and 48) show the conservationof domains and amino acid residues important for catalytic activity ofthese enzymes (FIGS. 3-5). Additionally, these proteins areevolutionarily related to members of the acyl-specific C-domains, ADLEand ACPH families of proteins as indicated by phylogenetic analysis(FIG. 6).

[0126] Locus A410 specifies 3 NRPS ORFs composed of 11 modules (FIG.1d). Consistent with an N-acyl peptide capping mechanism, theacyl-specific C-domain (SEQ ID NO: 18) is located at the N-terminalposition of the first NRPS ORF. Moreover, the ADLE and ACPH ORFs (SEQ IDNOS: 34 and 48 respectively) are found adjacent to the acyl-specificC-domain indicating that locus A410 specifies an N-acylated lipopeptideconsistent with the described characteristics of antibiotic compound41,012.

EXAMPLE 7

[0127] Identification of Putative Lipopeptide Biosynthetic Locus 070B:

[0128] In silico screening of the DECIPHER® database with sequencescorresponding to acyl-specific C-domains (SEQ ID NOS: 1, 2, 6, 8 and 10)revealed the presence of an acyl-specific C-domain in locus 070B foundin Streptomyces sp. (Ecopia BioSciences, strain 070). As shown in Table8, BLASTP analysis demonstrates that the 070B encoded C-domain (SEQ IDNO: 20) is more closely related to domains condensing acyl groups toamino acids than to domains condensing two amino acids, as exemplifiedby the C-domain of the first module found in the ramoplanin ORF13 asdescribed in detail in PCT/CA01/01462. TABLE 8 SEQ Target Domain ID NOProbing Domain E value SEQ ID NO  1 Consensus C1  1 e−153 20  2Consensus C2 2 e−67 20  6 RAMO C-domain 3 e−82 20  8 DAPT C-domain 3e−47 20 10 A541 C-domain 6 e−48 20 — RAMO ORF13, C-domain 9 e−16 20

[0129] Sequence alignment of the 070B acyl-specific C-domain (SEQ ID NO:20) with related domains from various lipopeptide biosynthetic ORFsshows conservation of domains and amino acid residues important forcatalytic activity of these enzymes (FIG. 3). Additionally, this proteinis evolutionarily related to members of the acyl-specific C-domains asindicated by phylogenetic analysis (FIG. 6).

[0130] In contrast to the other loci presented herein, ADLE and ACPHrelated proteins were not detected in 070B.

[0131] Analysis of the 070B locus found in the DECIPHER® database showsthe presence of an incomplete NRPS ORF composed of three modules (FIG.1d). Consistent with the biosynthesis of an N-acylated lipopeptide, theacyl-specific C-domain is located at the N-terminus of the NRPS ORF. Thelack of ADLE and ACPH sequences can be attributed to the fact that thesequence of the locus is not yet complete. Alternatively, 070B may besimilar to the CADA locus in Streptomyces coelicolor A3(2) whichspecifies an N-acylated lipopeptide and lacks ADLE and ACPH relatedenzymes. Despite the potential absence of ADLE and ACPH in 070B, thepresence and location of the acyl-specific C-domain clearly indicatesthat 070B specifies an N-acylated lipopeptide.

EXAMPLE 8

[0132] Biosynthesis of an N-acylated Lipopeptide by Locus 024A:

[0133] Locus 024A in Streptomyces refuineus subsp. thermotolerans NRRL3143 was shown to possess several characteristics of an N-acylatedlipopeptide encoding locus, namely the presence of an acyl-specificC-domain (SEQ ID NO: 14) located at the N-terminus of the first NRPS ORFinvolved in the assembly of the polypeptide, ADLE and ACPH familyproteins (SEQ ID NOS: 30 and 44 respectively) as well as an NRPSmultienzymatic system composed of 13 modules (see Example 5 and FIG.1c).

[0134] Protein homology analysis of the acyl-specific C-domain, the ADLEand the ACPH proteins with other proteins in the DECIPHER® databaseindicated a high homology of these proteins with corresponding proteinsfound in the A541 locus (SEQ ID NOS: 10, 36★ and 36★★) that specifiesproduction of antibiotic A54145 in Streptomyces fradiae NRRL 18158(Table 6 in example 5). Closer inspection of the two loci revealed thepresence of an identical NRPS system that could be responsible for thesynthesis of a 024A polypeptide scaffold identical to that of A54145(FIGS. 1b and c and U.S. Ser. No. 60/342,133, U.S. Ser. No. 30/372,789and co-pending U.S. Ser. No. 10/XXX,XXX).

[0135] Based on these observations and on the fact that there are knowngrowth conditions for expressing lipopeptide A54145 in Streptomycesfradiae (U.S. Pat. No. 4,977,083), Streptomyces refuineus subsp.thermotolerans was grown under identical culture conditions to assesspossible induction of locus 024A and determine the nature of thespecified product.

[0136]Streptomyces fradiae and Streptomyces refuineus subsp.thermotolerans were grown at 30° C. for 48 hour in a rotary shaker in 25mL of a seed medium consisting of glucose (10 g/L), potato starch (30g/L), soy flour (20 g/L), Pharmamedia (20g/L), and CaCO₃ (2 g/L) in tapwater. Five mL of this seed culture was used to inoculate 500 mL ofproduction media in a 4L baffled flask. Production media consisted ofglucose (25 g/L), soy grits (18.75 g/L), blackstrap molasses (3.75 g/L),casein (1.25 g/L), sodium acetate (8 g/L), and CaCO₃ (3.13 g/L) in tapwater, and proceeded for 7 days at 30° C. on a rotary shaker. Theproduction culture was centrifuged and filtered to remove mycelia andsolid matter. The pH was adjusted to 6.4 and 46 mL of Diaion HP20 wasadded and stirred for 30 minutes. HP20 resin was collected by Buchnerfiltration and washed successively with 140 mL water and 90 mL 15%CH₃CN/H₂0, and the wash was discarded. HP20 resin was then eluted with140 mL 50% CH₃CN/H₂0 (fraction HP20 E2). This pool was passed over a 5mL Amberlite IRA68 column (acetate cycle) and the flow through (fractionIRA FT) was reserved for bioassay. The column was washed with 25 mL 50%CH₃CN/H₂0 and eluted with 25 mL 50% CH₃CN/H₂0 containing 0.1 N HOAc(fraction IRA E1), and then eluted with 25 mL 50% CH₃CN/H₂0 containing1.0 N HOAc (fraction IRA E2). Biological activity was followed duringpurification by bioassay with Micrococcus luteus in Nutrient Agarcontaining 5 mM CaCl₂.

[0137]FIG. 9a is a photograph of a plate generated during extraction ofan anionic lipopeptide from Streptomyces fradiae. FIG. 9a shows anenrichment of activity based on IRA67 anion exchange chromatographyconsistent with expression of an acidic lipopeptide. This activity isconcentrated during the extraction procedure as indicated by theincreased diameter of lysis rings. A54145 was detected via HPLC/MS infraction IRA E2 as evidenced by mass ion ES²⁺=830.5 consistent with thestructures of A54145C,D (U.S. Pat. No. 4,994,270).

[0138]FIG. 9b is a photograph of a plate generated during a similarextraction scheme performed on extracts from Streptomyces refuineussubsp. thermotolerans. FIG. 9b shows a similar enrichment of activitybased on IRA67 anion exchange chromatography consistent with expressionof an acidic lipopeptide. This activity is concentrated during theextraction procedure as indicated by the increased diameter of lysisrings. A mass ion of ES²⁺=830.5, identical to that of A54145, waspresent in fraction IRA E2 confirming that an N-acylated acidiclipopeptide, identical to A54145C,D, is produced by 024A in Streptomycesrefuineus subsp. thermotolerans.

EXAMPLE 9

[0139] Use of the N-acyl Capping Cassette to Engineer PeptideSynthetases Capable of Producing Novel Lipopeptides

[0140] The availability and understanding of lipopeptide N-acyl cappingcomponents increases the potential of redesigning (un)natural productsby engineered peptide synthetases. It has been demonstrated that, usingknown molecular biology techniques, functional hybride peptidesynthetases may be engineered that are capable of producing rationallydesigned peptide products (Mootz et al. (2000) Proc. Natl. Acad. Sci. US A. Vol 97 pp. 5848-5853). Moreover, it has been postulated thatthrough domain swapping, change-of-substrate specificity by mutagenesis,and an induced termination to achieve release of a defined shortenedproduct, it may be possible to obtain a recombinant NRPS system thatproduces antipain, a potent cathepsin inhibitor produced by Streptomycesroseus and whose biosynthetic machinery is unknown (Doekel S, MarahielMA. (2001) Metab. Eng. Vol 3 pp. 64-77). Mootz et al. (supra) describedgenetic engineering using an NRPS system to produce a peptide productthat is not a naturally occurring product, and Doekel and Marahiel(supra) described a prophetic example of engineering an NRPS system tomake the known natural product antipain.

[0141] The following outlines a strategy whereby the NRPS biosyntheticmachinery of a nonlipopeptide natural product, complestatin, can bemodified so as to produce an N-acylated analogue of complestatin (FIG.10).

[0142]Streptomyces lavendulae produces complestatin, a cyclic peptidenatural product that antagonizes pharmacologically relevantprotein-protein interactions including formation of the C4 b, 2 bcomplex in the complement cascade and gp120-CD4 binding in the HIV lifecycle. Complestatin, a member of the vancomycin group of naturalproducts, consists of an alpha-ketoacyl hexapeptide backbone modified byoxidative phenolic couplings and halogenations. The entire complestatinbiosynthetic and regulatory gene cluster spanning ca. 50 kb was clonedand sequenced (Chiu et al. (2001) Proc. Natl. Acad. Sci. U S A Vol 98pp. 8548-8553). It includes four NRPS genes, comA, comB, comC, and comD(FIG. 10, panel a). The comA gene encodes an NRPS that is composed of aloading module that incorporates hydroxyphenylglycine (HPG; or aderivative thereof) followed by a module that incorporates tryptophan(Trp), the first two residues of complestatin. Through domain swapping,the loading module and the C domain of the tryptophan-incorporatingmodule can be replaced by one of the acyl-specific C-domains disclosedherein. Preferably, the acyl-specific C-domain of the A541, DAPT, or024A loci would be used, as these domains are naturally specific forcondensing an acyl moiety to a tryptophan residue. In addition to thisdomain swapping, the ADLE and ACPH genes would also be introduced intothe system so as to provide a means to generate activated acylsubstrates that can be used by the acyl-specific C domain. Thus, FIG.10b depicts a rationally designed recombinant NRPS system that shouldgive rise to N-acylated complestatin analogue(s). The recombinant NRPSsystem depicted in FIG. 10b could be employed either in vivo, using anappropriate recombinant host or in vitro using purified enzymessupplemented with the appropriate substrates.

[0143] One approach whereby N-acylated complestatin analogue(s) could begenerated in vivo would involve the use of Streptomyces lavendulae, thecomplestatin producer, as the host strain. Briefly, the N-acyl cappingcassette would replace the comA gene. This could be accomplished eitherby inactivation of the comA gene on the Streptomyces lavendulaechromosome followed by the introduction of a plasmid expressing theADLE, ACPH, and the recombinant ComA derivative, or by physicallyreplacing, by way of a double recombination (Keiser et al., supra) thecomA gene on the Streptomyces lavendulae chromosome by a cassettecontaining genes encoding the ADLE, ACPH, and the recombinant ComAderivative. The resulting recombinant strains could be further modifiedto include genes involved in the biosynthesis of the acyl moietiesand/or could be provided acyl moieties or precursors thereof in thefermentation medium.

[0144] One approach whereby N-acylated complestatin analogue(s) could begenerated in vitro would involve the over-expression of the ADLE, ACPH,recombinant ComA, ComB, ComC, and ComD polypeptides in an appropriatehost, for example E. coli, followed by the preparation of an extract orpurified fraction thereof and use of said preparation together withappropriate substrates as outlined in Mootz et al. (2000). It isexpected that, in the absence of accessory proteins the product producedby this in vitro system might not contain certain modifications such asthe cross-linking of residues that is catalyzed by specific complestatincytochrome P450 enzymes.

[0145] All patents, patent applications, and published references citedherein are hereby incorporated by reference in their entirety. Whilethis invention has been particularly shown and described with referenceto preferred embodiments thereof, it will be understood by those skilledin the art that various changes in form and details may be made thereinwithout departing from the scope of the invention encompassed by theappended claims.

1 48 1 435 PRT Artificial Sequence HMMer software generated consensussequence 1 Gly Gly Leu Arg Glu Leu Met Ala Gly Gln Leu Ala Val Trp HisAla 1 5 10 15 Gln Gln Leu Ala Pro Glu Asn Pro Val Tyr Asn Val Gly GluTyr Val 20 25 30 Glu Ile Asp Gly Glu Val Asp Leu Asp Leu Leu Val Ala AlaVal Arg 35 40 45 Arg Val Met Glu Glu Ala Asp Ala Ala Arg Leu Arg Phe ArgGlu Val 50 55 60 Asp Gly Val Pro Arg Gln Tyr Phe Ala Glu Asp Glu Asp TyrPro Val 65 70 75 80 Glu Val Ile Asp Val Ser Ala Glu Ala Asp Pro Arg AlaAla Ala Glu 85 90 95 Ser Leu Met Ala Ala Asp Leu Arg Arg Pro Arg Asp LeuArg Asp Gly 100 105 110 Glu Leu Tyr Thr Gln Lys Ile Tyr Lys Val Gly GluAsp Leu Val Phe 115 120 125 Trp Tyr Gln Arg Ala His His Ile Ile Leu AspGly Arg Ser Ala Gly 130 135 140 Leu Val Ala Ser Arg Val Ala Ala Val TyrSer Ala Leu Ala Ala Gly 145 150 155 160 Gly Asp Val Glu Glu Gly Ala LeuPro Ser Ser Ser Val Leu Met Asp 165 170 175 Ala Glu Asp Glu Tyr Arg AlaSer Glu Glu Phe Glu Leu Asp Arg Glu 180 185 190 Tyr Trp Arg Glu Ala LeuAla Gly Leu Pro Glu Glu Val Ser Leu Gly 195 200 205 Ala Asn Glu Pro SerArg Leu Pro Arg Glu Pro Val Arg His Glu Glu 210 215 220 Asp Val Ser AspAla Ala Ala Ala Glu Leu Arg Ala Ala Ala Arg Arg 225 230 235 240 Leu GlyThr Ser Leu Ala Gly Leu Ala Ile Ala Ala Ala Ala Leu Tyr 245 250 255 GlnHis Arg Leu Thr Gly Gln Arg Asp Val Val Val Gly Val Pro Val 260 265 270Ala Gly Arg Ser Lys Thr Ala Glu Leu Asp Ile Pro Gly Met Thr Ala 275 280285 Asn Val Val Pro Val Arg Leu Ala Val Ala Pro Lys Thr Thr Val Ala 290295 300 Glu Leu Val Arg Gln Val Ala Arg Gly Val Arg Asp Gly Leu Arg His305 310 315 320 Gln Arg Tyr Arg Tyr Glu Asp Ile Leu Asp Asp Leu Lys LeuVal Gly 325 330 335 Arg Asp Gly Leu Tyr Pro Leu Leu Val Asn Val Leu SerPhe Asp Tyr 340 345 350 Asp Leu Arg Phe Gly Asp Ala Val Ser Val Ala HisGly Leu Ser Ala 355 360 365 Gly Pro Val Asp Asp Val Ser Ile Asp Val TyrAsp Arg Ser Ser Asp 370 375 380 Gly Ser Met Lys Val Val Val Asp Val AsnPro Asp Leu Thr Asp Arg 385 390 395 400 Ser Asp Ala Asp Glu Val Ala ArgLys Phe Leu Ala Leu Leu Arg Trp 405 410 415 Leu Ala Glu Ser Asp Ala GluGlu Pro Val Ala Arg Ile Asp Leu Leu 420 425 430 Asp Glu Asp 435 2 451PRT Artificial Sequence HMMer software generated consensus sequence 2Ser Val Arg His Gly Val Thr Ala Ala Gln Arg Gly Val Trp Val Ala 1 5 1015 Gln Gln Leu Arg Pro Asp Ser Arg Leu Tyr Asn Cys Gly Leu Tyr Leu 20 2530 Glu Leu Asp Gly Ala Leu Asp Pro Ala Val Leu Ser Arg Ala Val Arg 35 4045 Arg Thr Leu Ala Glu Thr Glu Ala Leu Arg Ser Arg Phe Glu Glu Asp 50 5560 Asp Asp Gly Ala Leu Leu Gln Arg Val Leu Ala Pro Ala Pro Asp Glu 65 7075 80 Gln Thr Arg Leu Leu Glu Asp Gly Val Pro Tyr Thr Pro Val Leu Leu 8590 95 Arg His Ile Asp Leu Ser Gly Asp Asp Asp Pro Glu Ala Ala Ala Arg100 105 110 Arg Trp Met Asp Ala Asp Leu Ala Glu Pro Val Asp Leu Asp ArgAla 115 120 125 Gly Thr Ser Arg His Ala Leu Leu Thr Leu Gly Gly Asp ArgHis Leu 130 135 140 Leu Tyr Leu Gly Tyr His His Ile Ala Leu Asp Gly PheGly Ala Ala 145 150 155 160 Leu Tyr Leu Asp Arg Leu Ala Ala Val Tyr ArgAla Leu Arg Thr Gly 165 170 175 Arg Glu Pro Pro Pro Cys Pro Phe Gly ProLeu Asp Arg Leu Val Ala 180 185 190 Glu Glu Ala Ala Tyr Arg Asp Ser AlaArg His Arg Arg Asp Arg Ala 195 200 205 Tyr Trp Thr Gly Arg Phe Ala AspLeu Pro Glu Pro Val Gly Leu Ala 210 215 220 Gly Arg Ala Ala Ala Ala AlaPro Ala Pro Leu Arg Arg Thr Val Arg 225 230 235 240 Leu Pro Pro Glu ArgThr Ala Ala Leu Ala Ala Ala Ala Glu Ala Thr 245 250 255 Gly Ser Arg TrpPro Ala Val Val Ile Ala Ala Val Ala Ala Phe Leu 260 265 270 Arg Arg LeuAla Gly Ala Glu Glu Val Val Val Gly Leu Pro Val Thr 275 280 285 Ala ArgVal Thr Arg Ala Ala Leu Arg Thr Pro Gly Met Leu Ala Asn 290 295 300 ValLeu Pro Leu Arg Leu Glu Val Arg Gln Gly Ala Ser Phe Ala Ala 305 310 315320 Leu Leu Glu Glu Thr Ser Arg Ala Leu Ser Ala Leu Leu Arg His Gln 325330 335 Arg Phe Arg Gly Glu Asp Leu Gly Arg Glu Leu Gly Leu Ala Gly Glu340 345 350 Arg Ala Gly Leu Ala Pro Thr Thr Val Asn Val Met Ala Phe AlaPro 355 360 365 Val Leu Asp Phe Gly Asp Cys Arg Ala Val Val His Gln LeuSer Ser 370 375 380 Gly Pro Val Glu Asp Leu Ala Ile Asn Leu Tyr Gly ThrPro Gly Thr 385 390 395 400 Gly Asp Glu Leu Arg Val Thr Val Ala Ala AsnPro Ala Leu Tyr Thr 405 410 415 Ala Asp Asp Val Ala Ser Leu Gln Glu ArgLeu Val Arg Phe Leu Ala 420 425 430 Ala Leu Gly Ala Asp Pro Ala Ala ProVal Gly Arg Val Arg Leu Leu 435 440 445 Asp Pro Ala 450 3 603 PRTArtificial Sequence HMMer software generated consensus sequence 3 ValSer Ala Val Met Val Asp Leu Ala Ala Gly Pro Ser Val Pro Ala 1 5 10 15Ala Leu Arg Ala His Ala Glu Ala Arg Pro Asp Arg Thr Ala Val Val 20 25 30Phe Val Arg Asp Thr Asp Arg Ala Asp Gly Thr Ala Ser Leu Ser Tyr 35 40 45Ala Glu Leu Asp Arg Arg Ala Arg Ala Val Ala Val Trp Leu Arg Ala 50 55 60Arg Leu Ala Pro Gly Asp Arg Val Leu Leu Leu His Pro Ala Gly Pro 65 70 7580 Glu Phe Val Ala Ala Tyr Leu Gly Cys Leu Tyr Ala Gly Leu Val Ala 85 9095 Val Pro Ala Pro Leu Pro Gly Gly Tyr Ser His Glu Arg Arg Arg Val 100105 110 Val Gly Ile Ala Ala Asp Ala Gly Ala Gly Ala Val Leu Thr Asp Ala115 120 125 Asp Thr Glu Ala Glu Val Arg Glu Trp Leu Ala Glu Thr Gly LeuPro 130 135 140 Gly Leu Pro Val Leu Ala Val Asp Pro Leu Ala Ala Asp GlyAsp Pro 145 150 155 160 Gly Ala Trp Arg Pro Pro Gly Leu Arg Ala Asp ThrVal Ala Val Leu 165 170 175 Gln Tyr Thr Ser Gly Ser Thr Gly Ser Pro LysGly Val Val Val Thr 180 185 190 His Gly Asn Leu Leu Ala Asn Ala Arg SerLeu Ser Arg Ser Phe Gly 195 200 205 Leu Thr Glu Asp Thr Val Phe Gly GlyTrp Leu Pro Leu Tyr His Asp 210 215 220 Met Gly Leu Phe Gly Leu Leu LeuPro Ala Leu Phe Leu Gly Ala Thr 225 230 235 240 Val Val Leu Met Ser ProSer Ala Phe Leu Arg Arg Pro His Leu Trp 245 250 255 Leu Arg Leu Ile AspArg Phe Gly Val Val Phe Ser Ala Ala Pro Asp 260 265 270 Phe Ala Tyr AspLeu Cys Val Arg Arg Val Thr Asp Glu Gln Ile Ala 275 280 285 Gly Leu AspLeu Ser Arg Trp Arg Trp Ala Ala Asn Gly Ser Glu Pro 290 295 300 Ile ArgAla Ala Thr Leu Arg Ala Phe Ala Glu Arg Phe Ala Pro Ala 305 310 315 320Gly Leu Arg Pro Glu Ala Leu Thr Pro Cys Tyr Gly Leu Ala Glu Ala 325 330335 Thr Leu Phe Val Ser Gly Lys Ser Ala Gly Pro Leu Arg Thr Arg Arg 340345 350 Val Asp Pro Ala Ala Leu Glu Asp His Arg Phe Glu Glu Ala Val Pro355 360 365 Gly Arg Pro Ala Arg Glu Ile Val Ser Cys Gly Arg Val Pro AspLeu 370 375 380 Glu Val Arg Ile Val Asp Pro Gly Thr Gly Arg Pro Leu ProAsp Gly 385 390 395 400 Ala Val Gly Glu Ile Trp Leu Arg Gly Pro Ser ValAla Ala Gly Tyr 405 410 415 Trp Gly Arg Pro Glu Ala Thr Ala Glu Thr PheGly Ala Val Thr Asp 420 425 430 Gly Gly Asp Gly Pro Trp Leu Arg Thr GlyAsp Leu Gly Ala Leu Tyr 435 440 445 Glu Gly Glu Leu Tyr Val Thr Gly ArgIle Lys Glu Leu Leu Ile Val 450 455 460 His Gly Arg Asn Leu Tyr Pro HisAsp Ile Glu His Glu Leu Arg Ala 465 470 475 480 Ala His Asp Glu Leu AlaGly Ala Val Gly Ala Ala Phe Ala Val Pro 485 490 495 Ala Pro Gly Gly GlyGlu Glu Val Leu Val Val Val His Glu Val Arg 500 505 510 Pro Arg Val ProAla Asp Glu Leu Pro Ala Leu Ala Ser Ala Met Arg 515 520 525 Ala Thr ValAla Arg Glu Phe Gly Val Pro Ala Ala Gly Val Val Leu 530 535 540 Val ArgArg Gly Thr Val Arg Arg Thr Thr Ser Gly Lys Val Gln Arg 545 550 555 560Arg Ala Met Arg Glu Leu Phe Leu Thr Gly Glu Leu Ala Pro Val His 565 570575 Ala Glu Leu Gly Pro His Leu Gln Ala Ala Ala Ala Gly Glu Ala Arg 580585 590 Ala Ala Thr Ser Leu Ala Pro Ala Ser Thr Val 595 600 4 91 PRTArtificial Sequence HMMer software generated consensus sequence 4 MetSer Asp Leu Thr Ala Pro Pro Ala Arg His Thr Pro Glu Glu Leu 1 5 10 15Arg Ala Trp Leu Arg Glu Cys Val Ala Asp Tyr Val Gly Leu Pro Pro 20 25 30Ala Glu Ile Ala Thr Asp Val Pro Leu Thr Asp Tyr Gly Leu Asp Ser 35 40 45Val Tyr Ala Leu Ala Leu Cys Ala Glu Ile Glu Asp His Leu Gly Ile 50 55 60Glu Val Asp Pro Thr Leu Leu Trp Asp His Pro Thr Ile Asp Glu Leu 65 70 7580 Ser Ala Ala Leu Ala Pro Arg Leu Ala Arg Arg 85 90 5 1305 DNAActinoplanes sp. 5 cctgacctgc gcccgctcac gcccgcccag ctcgccgtctggcacgcgca gcagctcgcc 60 ccgcacagcc ccgtctatca ggtcggcgag ttcgtcgagatcgacggcga gtgcgacccc 120 gatctcctgg tggcggcgtt gcgtcaggtc atgggcgaggccgagagcgc ccggctgcgg 180 ttccgcgtga tcgacggtac gccgtggcag tacgtcgccgaggacggcga cgacccgatc 240 caggtcgtgg acctcggcgc ggccgcggac ccgcgcgccgcggcgctggg ccgcatggcg 300 gccgacctcg accggcccgg cgacctgcgc gacggcccgctcgtcgagca ccacgtctac 360 ctgctcggcg agggccgggt catctggtac caccgcgcgcaccacatcgt ctgcgacggc 420 ggcagcctcg gcattgtcgc ctcccgggtg gccggcgtctattccgcgct cgcggccggt 480 ggtgacgtcc ggccgggtgc gctgccgccg ctgtcggtgttgctgtcggc cgccgacgcc 540 tacgagcgct ccggcgaccg cgaccgggac cgcgagcactggcgctccgc gctggcgggc 600 ctgcccgccg agctgctcgc gggcgcgggc cggccgcggccgctgcccgg accgccggtg 660 cgccacgagc acgacctctc cgcggcggag gcgggccggctgcgcgcggg ggcgcggcgg 720 ctgcggacca gcgtggcgca ggccggcatc gcggccgcggccctctacca gcaccggctc 780 accggcgccc gggacgtgct ggtggcggtg cccgtcgccggccgcaccac ccgcccggag 840 ttcgacgtgc ccggcatgac gtcgaacgtg gtgccggtgcgcctcgcggt cacgcccgcc 900 acgaccgtcg gcgagctgct gcgcgacgtc gcccgtggtgtccgcgacgg cctgcggcac 960 cagcggtacc cgtacccgaa catcgtggac gacctcggcctggccgaccg tgccgcgctg 1020 cgcccggtga ccgtcaacgc cctggcgctg ggacggccgctgcgcttcgg ctcggcggtg 1080 ggtgtgcgct ccggcctgtc ggcgggcccg gtggacgacgtcaccatcgg cctctacgaa 1140 aaggtcagcg gcggcggcat gcagacgatc gccgagctgaaccccgggcg cacggaccgc 1200 ccggacgcgg cggaggtctc ccgctggttc cgtacgctgctgcgcgggct ggccgagagc 1260 gacgccggcg acccggtggc ccgcatcgac atcgtcgacgagccc 1305 6 435 PRT Actinoplanes sp. 6 Pro Asp Leu Arg Pro Leu Thr ProAla Gln Leu Ala Val Trp His Ala 1 5 10 15 Gln Gln Leu Ala Pro His SerPro Val Tyr Gln Val Gly Glu Phe Val 20 25 30 Glu Ile Asp Gly Glu Cys AspPro Asp Leu Leu Val Ala Ala Leu Arg 35 40 45 Gln Val Met Gly Glu Ala GluSer Ala Arg Leu Arg Phe Arg Val Ile 50 55 60 Asp Gly Thr Pro Trp Gln TyrVal Ala Glu Asp Gly Asp Asp Pro Ile 65 70 75 80 Gln Val Val Asp Leu GlyAla Ala Ala Asp Pro Arg Ala Ala Ala Leu 85 90 95 Gly Arg Met Ala Ala AspLeu Asp Arg Pro Gly Asp Leu Arg Asp Gly 100 105 110 Pro Leu Val Glu HisHis Val Tyr Leu Leu Gly Glu Gly Arg Val Ile 115 120 125 Trp Tyr His ArgAla His His Ile Val Cys Asp Gly Gly Ser Leu Gly 130 135 140 Ile Val AlaSer Arg Val Ala Gly Val Tyr Ser Ala Leu Ala Ala Gly 145 150 155 160 GlyAsp Val Arg Pro Gly Ala Leu Pro Pro Leu Ser Val Leu Leu Ser 165 170 175Ala Ala Asp Ala Tyr Glu Arg Ser Gly Asp Arg Asp Arg Asp Arg Glu 180 185190 His Trp Arg Ser Ala Leu Ala Gly Leu Pro Ala Glu Leu Leu Ala Gly 195200 205 Ala Gly Arg Pro Arg Pro Leu Pro Gly Pro Pro Val Arg His Glu His210 215 220 Asp Leu Ser Ala Ala Glu Ala Gly Arg Leu Arg Ala Gly Ala ArgArg 225 230 235 240 Leu Arg Thr Ser Val Ala Gln Ala Gly Ile Ala Ala AlaAla Leu Tyr 245 250 255 Gln His Arg Leu Thr Gly Ala Arg Asp Val Leu ValAla Val Pro Val 260 265 270 Ala Gly Arg Thr Thr Arg Pro Glu Phe Asp ValPro Gly Met Thr Ser 275 280 285 Asn Val Val Pro Val Arg Leu Ala Val ThrPro Ala Thr Thr Val Gly 290 295 300 Glu Leu Leu Arg Asp Val Ala Arg GlyVal Arg Asp Gly Leu Arg His 305 310 315 320 Gln Arg Tyr Pro Tyr Pro AsnIle Val Asp Asp Leu Gly Leu Ala Asp 325 330 335 Arg Ala Ala Leu Arg ProVal Thr Val Asn Ala Leu Ala Leu Gly Arg 340 345 350 Pro Leu Arg Phe GlySer Ala Val Gly Val Arg Ser Gly Leu Ser Ala 355 360 365 Gly Pro Val AspAsp Val Thr Ile Gly Leu Tyr Glu Lys Val Ser Gly 370 375 380 Gly Gly MetGln Thr Ile Ala Glu Leu Asn Pro Gly Arg Thr Asp Arg 385 390 395 400 ProAsp Ala Ala Glu Val Ser Arg Trp Phe Arg Thr Leu Leu Arg Gly 405 410 415Leu Ala Glu Ser Asp Ala Gly Asp Pro Val Ala Arg Ile Asp Ile Val 420 425430 Asp Glu Pro 435 7 1305 DNA Streptomyces roseosporus 7 tcgcagcgcctcggcgtcac cgccgcccaa cagagcgtct ggctcgccgg ccagctggcg 60 gacgaccaccgcctgtacca ctgtgcggcg tacctgtcac tcaccgggtc catcgacccg 120 cggacactcggcacggcggt ccggcggacc ctcgacgaga ccgaggcgct gcgtacccgg 180 ttcgtaccgcaggacgggga actgctgcag atcctcgaac ccggtgccgg acagctcctg 240 ctggaagccgacttctccgg cgacccggac cccgagcggg cggcacacga ctggatgcac 300 gcggcgctcgccgcaccggt ccgcctcgac cgcgccggga ccgccaccca cgccctgctc 360 accctcggcccgtcccgcca cctgctgtac ttcggctacc accacatcgc gctcgacggc 420 tacggtgccctgctccacct gcgccgcctc gcccacgtct acaccgccct cagcaacggg 480 gacgaccccggcccctgccc gttcggcccc ctggccggtg tcctcacgga ggaggcggcc 540 taccgtgactccgacaacca tcggcgcgac ggggaattct ggacccggtc cctcgccggt 600 gcggacgaggcccccgggct gagcgagcgg gaggccggcg ctctcgccgt cccgctgcgc 660 cgcaccgtggagctgtccgg cgaacggacg gagaagctgg ccgcctcggc cgcggccact 720 ggagctcgctggtcgtcact gctcgtcgcc gccaccgccg cgttcgtacg ccgccacgct 780 gccgccgacgacaccgtcat cggcctgccc gtcaccgccc ggctcaccgg gccggcgctg 840 cgtaccccgtgcatgctcgc caacgacgtg ccgctgcgcc tcgacgcccg gctcgatgcc 900 ccgttcgccgcgctccttgc cgacaccacc cgcgccgtcg gcacgctggc gcgccaccag 960 cggttccgcggggaagaact ccaccggaac ctggggggcg tcggccgcac cgcgggcctg 1020 gcgcgggtcaccgtcaacgt cctggcgtat gtcgacaaca tccggttcgg cgactgccgg 1080 gccgtggtccacgagttgtc ctcgggaccg gtccgcgact tccacatcaa ctcctacggc 1140 acccccggcacccccgacgg cgtccagctg gtcttcagcg gtaaccccgc cctgtacacg 1200 gccaccgatctggccgacca ccaggagcgg ttcctgcgct tcctcgacgc tgtgaccgcc 1260 gacccggacctgccgaccgg aagacaccgc ctcctgtcgc cgggc 1305 8 435 PRT Streptomycesroseosporus 8 Ser Gln Arg Leu Gly Val Thr Ala Ala Gln Gln Ser Val TrpLeu Ala 1 5 10 15 Gly Gln Leu Ala Asp Asp His Arg Leu Tyr His Cys AlaAla Tyr Leu 20 25 30 Ser Leu Thr Gly Ser Ile Asp Pro Arg Thr Leu Gly ThrAla Val Arg 35 40 45 Arg Thr Leu Asp Glu Thr Glu Ala Leu Arg Thr Arg PheVal Pro Gln 50 55 60 Asp Gly Glu Leu Leu Gln Ile Leu Glu Pro Gly Ala GlyGln Leu Leu 65 70 75 80 Leu Glu Ala Asp Phe Ser Gly Asp Pro Asp Pro GluArg Ala Ala His 85 90 95 Asp Trp Met His Ala Ala Leu Ala Ala Pro Val ArgLeu Asp Arg Ala 100 105 110 Gly Thr Ala Thr His Ala Leu Leu Thr Leu GlyPro Ser Arg His Leu 115 120 125 Leu Tyr Phe Gly Tyr His His Ile Ala LeuAsp Gly Tyr Gly Ala Leu 130 135 140 Leu His Leu Arg Arg Leu Ala His ValTyr Thr Ala Leu Ser Asn Gly 145 150 155 160 Asp Asp Pro Gly Pro Cys ProPhe Gly Pro Leu Ala Gly Val Leu Thr 165 170 175 Glu Glu Ala Ala Tyr ArgAsp Ser Asp Asn His Arg Arg Asp Gly Glu 180 185 190 Phe Trp Thr Arg SerLeu Ala Gly Ala Asp Glu Ala Pro Gly Leu Ser 195 200 205 Glu Arg Glu AlaGly Ala Leu Ala Val Pro Leu Arg Arg Thr Val Glu 210 215 220 Leu Ser GlyGlu Arg Thr Glu Lys Leu Ala Ala Ser Ala Ala Ala Thr 225 230 235 240 GlyAla Arg Trp Ser Ser Leu Leu Val Ala Ala Thr Ala Ala Phe Val 245 250 255Arg Arg His Ala Ala Ala Asp Asp Thr Val Ile Gly Leu Pro Val Thr 260 265270 Ala Arg Leu Thr Gly Pro Ala Leu Arg Thr Pro Cys Met Leu Ala Asn 275280 285 Asp Val Pro Leu Arg Leu Asp Ala Arg Leu Asp Ala Pro Phe Ala Ala290 295 300 Leu Leu Ala Asp Thr Thr Arg Ala Val Gly Thr Leu Ala Arg HisGln 305 310 315 320 Arg Phe Arg Gly Glu Glu Leu His Arg Asn Leu Gly GlyVal Gly Arg 325 330 335 Thr Ala Gly Leu Ala Arg Val Thr Val Asn Val LeuAla Tyr Val Asp 340 345 350 Asn Ile Arg Phe Gly Asp Cys Arg Ala Val ValHis Glu Leu Ser Ser 355 360 365 Gly Pro Val Arg Asp Phe His Ile Asn SerTyr Gly Thr Pro Gly Thr 370 375 380 Pro Asp Gly Val Gln Leu Val Phe SerGly Asn Pro Ala Leu Tyr Thr 385 390 395 400 Ala Thr Asp Leu Ala Asp HisGln Glu Arg Phe Leu Arg Phe Leu Asp 405 410 415 Ala Val Thr Ala Asp ProAsp Leu Pro Thr Gly Arg His Arg Leu Leu 420 425 430 Ser Pro Gly 435 91359 DNA Streptomyces fradiae 9 gcacaccgtg tggccgccac gtcggcccagaccgggatct ggacggcgca gcgtctgcgc 60 ggggacgaca ggctctacgc ctgcggcctcttcctcgaac tcgaccacgt ggtggaggag 120 gtgctgagcg aggcgatccg ccgcgccgtcgccgacaccg aggcgctgcg caccgcgttc 180 cgggaggacg cggacggcgc gctggagcagcacgtcctcg cccggccgcc gagcacgcag 240 acccgcctct tccacgccga cccgagcggcggaaccccct cccgctccgc gtccctggac 300 tggatggacc ggcaacgggc gcaaccctgggacctcgcgt cgggcgacac ctgccgtcat 360 accctgatcc ccctcggcgg cgaccgctcgctgctgcacc tgcgttacca ccacctcgcc 420 ctggacgggt acggcgccgc gctctatctggaccggctcg cggcggtcta ccgcgcgctg 480 cgcaccggcc atcaaccgcc cccctgcgcgttcgcgccgc tggcccgcct ggtcgaggag 540 gaccacgcct accggaactc cgcccgtcaccgcgcggacg ccaatcactg gcgcgaccgc 600 ttcgcggacc tcccgcgccc caccagcctcgccgacgcca ccacgcccgc ggcgcccacc 660 acgcccgcca cgcccgccgc gcccgccgcgcccgacgaac tgcggcgcac cgtgcgcctg 720 tccgccgccc ggtccgccgc gctgcgccgtgcctcggacc ggagcggccg accctggccc 780 gtgtacgcca cggccgcggt ggccgccttcctgagccgac tcgcgccggg ggaggaggtc 840 gtcgtcggcc tcccggtcac cgccagggtgacccccgccg cggtgcgcac accggggatg 900 ctcgccaacg tcgtaccgct tcgcctgcccgtccggcagg gcatgtcgac ggcggagctg 960 ctggagctga ccgcggccga gatcagcaccacactgcgcc accagcgcca ccgcaccgag 1020 gacatcgggc gggcgctcgg actccacggcgctccgccag ccaccacact cgtgaacgtc 1080 atggcgttcg ccccggtcct cgacttcggcgactgccggg ccccggtgca ccagctctcg 1140 gccggaccgg tggaggacct ggtcgtcaacctcctcggca ccccgggcga cggcggcgag 1200 agcgacggca ccgagctgga gatcactgtcgccgccaacc cccgcctcca ctcggcggac 1260 gcggtggcct cgctggccgc gcggctcgcggagttcctca cgcacatggg gcaggacgcc 1320 gaggcgcccc tcggccggac ccggctgctcgacgcggag 1359 10 453 PRT Streptomyces fradiae 10 Ala His Arg Val AlaAla Thr Ser Ala Gln Thr Gly Ile Trp Thr Ala 1 5 10 15 Gln Arg Leu ArgGly Asp Asp Arg Leu Tyr Ala Cys Gly Leu Phe Leu 20 25 30 Glu Leu Asp HisVal Val Glu Glu Val Leu Ser Glu Ala Ile Arg Arg 35 40 45 Ala Val Ala AspThr Glu Ala Leu Arg Thr Ala Phe Arg Glu Asp Ala 50 55 60 Asp Gly Ala LeuGlu Gln His Val Leu Ala Arg Pro Pro Ser Thr Gln 65 70 75 80 Thr Arg LeuPhe His Ala Asp Pro Ser Gly Gly Thr Pro Ser Arg Ser 85 90 95 Ala Ser LeuAsp Trp Met Asp Arg Gln Arg Ala Gln Pro Trp Asp Leu 100 105 110 Ala SerGly Asp Thr Cys Arg His Thr Leu Ile Pro Leu Gly Gly Asp 115 120 125 ArgSer Leu Leu His Leu Arg Tyr His His Leu Ala Leu Asp Gly Tyr 130 135 140Gly Ala Ala Leu Tyr Leu Asp Arg Leu Ala Ala Val Tyr Arg Ala Leu 145 150155 160 Arg Thr Gly His Gln Pro Pro Pro Cys Ala Phe Ala Pro Leu Ala Arg165 170 175 Leu Val Glu Glu Asp His Ala Tyr Arg Asn Ser Ala Arg His ArgAla 180 185 190 Asp Ala Asn His Trp Arg Asp Arg Phe Ala Asp Leu Pro ArgPro Thr 195 200 205 Ser Leu Ala Asp Ala Thr Thr Pro Ala Ala Pro Thr ThrPro Ala Thr 210 215 220 Pro Ala Ala Pro Ala Ala Pro Asp Glu Leu Arg ArgThr Val Arg Leu 225 230 235 240 Ser Ala Ala Arg Ser Ala Ala Leu Arg ArgAla Ser Asp Arg Ser Gly 245 250 255 Arg Pro Trp Pro Val Tyr Ala Thr AlaAla Val Ala Ala Phe Leu Ser 260 265 270 Arg Leu Ala Pro Gly Glu Glu ValVal Val Gly Leu Pro Val Thr Ala 275 280 285 Arg Val Thr Pro Ala Ala ValArg Thr Pro Gly Met Leu Ala Asn Val 290 295 300 Val Pro Leu Arg Leu ProVal Arg Gln Gly Met Ser Thr Ala Glu Leu 305 310 315 320 Leu Glu Leu ThrAla Ala Glu Ile Ser Thr Thr Leu Arg His Gln Arg 325 330 335 His Arg ThrGlu Asp Ile Gly Arg Ala Leu Gly Leu His Gly Ala Pro 340 345 350 Pro AlaThr Thr Leu Val Asn Val Met Ala Phe Ala Pro Val Leu Asp 355 360 365 PheGly Asp Cys Arg Ala Pro Val His Gln Leu Ser Ala Gly Pro Val 370 375 380Glu Asp Leu Val Val Asn Leu Leu Gly Thr Pro Gly Asp Gly Gly Glu 385 390395 400 Ser Asp Gly Thr Glu Leu Glu Ile Thr Val Ala Ala Asn Pro Arg Leu405 410 415 His Ser Ala Asp Ala Val Ala Ser Leu Ala Ala Arg Leu Ala GluPhe 420 425 430 Leu Thr His Met Gly Gln Asp Ala Glu Ala Pro Leu Gly ArgThr Arg 435 440 445 Leu Leu Asp Ala Glu 450 11 1296 DNA Streptomycesghanaensis 11 tcggtgcgtc atggggtgct ggccgcgcag cgagaggtct gggtggcccagcaactgcgg 60 ccgctcagcc ctcggttcaa ctgcggcgtt ttcctggacg tcggcgaggccctcgacgcc 120 gccgtgctcc gccgcgccgt gacccgtgcc ctggaggaga cggagacgctgcgctcactg 180 ttcgccgaac aggacggcga cggcgagatc atccggacca cgcggccggcccccgacgac 240 tgcgtgacga caatcgacgt gcgcgacgcg gacgacccgg tcgccgcggcacggcggtgg 300 atggacgccg acctggccga gccggtcgac ctgcggcacg acccgagctaccggcacgtg 360 ctgttccgga tcggcgagcg gcgctccatc ttctacttcc gctaccaccacatcacgctc 420 gacggtttcg ggcagaccct gtacctgaac cgggtggccg acctctacacggccctggcc 480 accgccaccg agccggacgc ggccccgttc ggcggcctgg accgcctgctggacgaggaa 540 cggcagtacg aggactccgc caggtgcgcc gaggaccggg cccactggcacaccaccgcc 600 cggtccctcg ccgagggacg cggcagcggc ccggcggccg cctcggaccaggtgctccgc 660 gacacggtgc ggctgccgcg ggaactgacc gacgcggtgt gcgcccacgcgcggtcgcac 720 ggttcgcgct ggaccgcggt gatgctgggt gccgtggcgg cctgtgcccggcggcggcag 780 ggcgacgacg cggtcgtgat cgacctgccg gtgaccgccc gcacgacccgggccgcgctg 840 acgacgcccg gcatgatgtc gaacgtgctg ccgctgcggc tggaggtcgcgcgcgacgcg 900 gacctgcgcg ccctcacgga ggaggtgtcc cgggcactgc cggcgacactccggcaccag 960 cgcttccgcg gcgaggagct gtaccgcgag ctcggcgcgg gcggcgcgcggggacacctc 1020 tcggtgaacg tgatgccgtt cgaccaccag gtgcggttcg gcaccgcgccggcgaccctg 1080 caccaactgg ccaacggcca ggtgcacgag gtggcgatcg acgtgtacgggacccccgac 1140 aagggcggcg acatccacgt caccgtgcac gccaacgccc ggacgcacaccgtcgaggac 1200 gtccggcagt ggcaccggga gctgcgccgc atgctcgtcc acctcctcggcggaccgggc 1260 cgcacggtcg gcgaggccga actgctcgac gaggcc 1296 12 432 PRTStreptomyces ghanaensis 12 Ser Val Arg His Gly Val Leu Ala Ala Gln ArgGlu Val Trp Val Ala 1 5 10 15 Gln Gln Leu Arg Pro Leu Ser Pro Arg PheAsn Cys Gly Val Phe Leu 20 25 30 Asp Val Gly Glu Ala Leu Asp Ala Ala ValLeu Arg Arg Ala Val Thr 35 40 45 Arg Ala Leu Glu Glu Thr Glu Thr Leu ArgSer Leu Phe Ala Glu Gln 50 55 60 Asp Gly Asp Gly Glu Ile Ile Arg Thr ThrArg Pro Ala Pro Asp Asp 65 70 75 80 Cys Val Thr Thr Ile Asp Val Arg AspAla Asp Asp Pro Val Ala Ala 85 90 95 Ala Arg Arg Trp Met Asp Ala Asp LeuAla Glu Pro Val Asp Leu Arg 100 105 110 His Asp Pro Ser Tyr Arg His ValLeu Phe Arg Ile Gly Glu Arg Arg 115 120 125 Ser Ile Phe Tyr Phe Arg TyrHis His Ile Thr Leu Asp Gly Phe Gly 130 135 140 Gln Thr Leu Tyr Leu AsnArg Val Ala Asp Leu Tyr Thr Ala Leu Ala 145 150 155 160 Thr Ala Thr GluPro Asp Ala Ala Pro Phe Gly Gly Leu Asp Arg Leu 165 170 175 Leu Asp GluGlu Arg Gln Tyr Glu Asp Ser Ala Arg Cys Ala Glu Asp 180 185 190 Arg AlaHis Trp His Thr Thr Ala Arg Ser Leu Ala Glu Gly Arg Gly 195 200 205 SerGly Pro Ala Ala Ala Ser Asp Gln Val Leu Arg Asp Thr Val Arg 210 215 220Leu Pro Arg Glu Leu Thr Asp Ala Val Cys Ala His Ala Arg Ser His 225 230235 240 Gly Ser Arg Trp Thr Ala Val Met Leu Gly Ala Val Ala Ala Cys Ala245 250 255 Arg Arg Arg Gln Gly Asp Asp Ala Val Val Ile Asp Leu Pro ValThr 260 265 270 Ala Arg Thr Thr Arg Ala Ala Leu Thr Thr Pro Gly Met MetSer Asn 275 280 285 Val Leu Pro Leu Arg Leu Glu Val Ala Arg Asp Ala AspLeu Arg Ala 290 295 300 Leu Thr Glu Glu Val Ser Arg Ala Leu Pro Ala ThrLeu Arg His Gln 305 310 315 320 Arg Phe Arg Gly Glu Glu Leu Tyr Arg GluLeu Gly Ala Gly Gly Ala 325 330 335 Arg Gly His Leu Ser Val Asn Val MetPro Phe Asp His Gln Val Arg 340 345 350 Phe Gly Thr Ala Pro Ala Thr LeuHis Gln Leu Ala Asn Gly Gln Val 355 360 365 His Glu Val Ala Ile Asp ValTyr Gly Thr Pro Asp Lys Gly Gly Asp 370 375 380 Ile His Val Thr Val HisAla Asn Ala Arg Thr His Thr Val Glu Asp 385 390 395 400 Val Arg Gln TrpHis Arg Glu Leu Arg Arg Met Leu Val His Leu Leu 405 410 415 Gly Gly ProGly Arg Thr Val Gly Glu Ala Glu Leu Leu Asp Glu Ala 420 425 430 13 1314DNA Streptomyces refuineus 13 gcagaccgcg tggccgccac ctcggcccagtccgggatct ggacggcaca gcggctgcgc 60 tcggatgacc ggctctacac ctgcggcctctacctcgaac tcgaccacgt ggtggaggag 120 gtgctgggcg aggcgatcgg ccgtgcggtcgccgacaccg aggcgctgcg caccgccttc 180 ggggaggacg gggacggcgc gctggaacagcgcgtgctcg cgcggccgcc ggacacgcag 240 acacggctgt tccggctgga cctgggcggagacgaccggc cccgcgccga ggccctggac 300 tggatggacc ggcagcaggc ggaaccgtgggacctcgccg ccggcgacac ctgccggcac 360 accctgatcc gcctcggcgg ccaccgcaccgtcctgcacc tgcgctacca ccacctcgcc 420 ctggacgggt tcggtgccgc gctctacctggacaggatcg cggcggtgta ccgggcgctg 480 cgcaccggcc gggagacgcc cccctgcaccttcgcgccgc tggcccgcct cgtggaggag 540 gaccgcgcct accggcggtc cgcccgccaccgcagggacg ccgaccactg gcggacgcgc 600 ttcgcggacc tcccccgccc caccagcctcgccggcgccg ccgcgcccgc cgcgcccgcc 660 gcgctgcgcc acacggtccg cgtgtccgcggccgacaccg ccgcactggg cctgcgggcg 720 gaccggagcg gcagcacctg gccggtgttcgccacggccg cggtggccgc cttcctgagc 780 cgcctcgcgc cgggggagga ggtcgtcgtcggcttcccgg tcaccgccag ggtcacgccc 840 gccgcggtgc gcacgccggg gatgctggcgaacgtcgtgc cgctccggat ccgggtgcgg 900 caggggatgt cgttcgccgc gctgctggaccggaccgcgg ccgagatcgg cgccacgctg 960 cggcaccagc gccaccgcac cgaggacatcggccgggcgc tcggcctccc cccgcacggc 1020 gcccagccgg ccccgaccct ggtcaacgtcatggccttcg ccccggtgct cgacttcggc 1080 gactgcctct cgccggtgca ccagctgtcggccggcccgg tcgaggacct ggcggtcaac 1140 ctgctcggca cccccgggga cggccgggagctggagatca ccgtcgccgc caaccccctg 1200 ctccactcgg aggacgcggt ggcgtcgctggccgcgcggc tggcggagtt cctggcgcgc 1260 gcgggcgagc acgccgacgc cccgatcggccggacacgcc tgctcggcgc ggcg 1314 14 438 PRT Streptomyces refuineus 14 AlaAsp Arg Val Ala Ala Thr Ser Ala Gln Ser Gly Ile Trp Thr Ala 1 5 10 15Gln Arg Leu Arg Ser Asp Asp Arg Leu Tyr Thr Cys Gly Leu Tyr Leu 20 25 30Glu Leu Asp His Val Val Glu Glu Val Leu Gly Glu Ala Ile Gly Arg 35 40 45Ala Val Ala Asp Thr Glu Ala Leu Arg Thr Ala Phe Gly Glu Asp Gly 50 55 60Asp Gly Ala Leu Glu Gln Arg Val Leu Ala Arg Pro Pro Asp Thr Gln 65 70 7580 Thr Arg Leu Phe Arg Leu Asp Leu Gly Gly Asp Asp Arg Pro Arg Ala 85 9095 Glu Ala Leu Asp Trp Met Asp Arg Gln Gln Ala Glu Pro Trp Asp Leu 100105 110 Ala Ala Gly Asp Thr Cys Arg His Thr Leu Ile Arg Leu Gly Gly His115 120 125 Arg Thr Val Leu His Leu Arg Tyr His His Leu Ala Leu Asp GlyPhe 130 135 140 Gly Ala Ala Leu Tyr Leu Asp Arg Ile Ala Ala Val Tyr ArgAla Leu 145 150 155 160 Arg Thr Gly Arg Glu Thr Pro Pro Cys Thr Phe AlaPro Leu Ala Arg 165 170 175 Leu Val Glu Glu Asp Arg Ala Tyr Arg Arg SerAla Arg His Arg Arg 180 185 190 Asp Ala Asp His Trp Arg Thr Arg Phe AlaAsp Leu Pro Arg Pro Thr 195 200 205 Ser Leu Ala Gly Ala Ala Ala Pro AlaAla Pro Ala Ala Leu Arg His 210 215 220 Thr Val Arg Val Ser Ala Ala AspThr Ala Ala Leu Gly Leu Arg Ala 225 230 235 240 Asp Arg Ser Gly Ser ThrTrp Pro Val Phe Ala Thr Ala Ala Val Ala 245 250 255 Ala Phe Leu Ser ArgLeu Ala Pro Gly Glu Glu Val Val Val Gly Phe 260 265 270 Pro Val Thr AlaArg Val Thr Pro Ala Ala Val Arg Thr Pro Gly Met 275 280 285 Leu Ala AsnVal Val Pro Leu Arg Ile Arg Val Arg Gln Gly Met Ser 290 295 300 Phe AlaAla Leu Leu Asp Arg Thr Ala Ala Glu Ile Gly Ala Thr Leu 305 310 315 320Arg His Gln Arg His Arg Thr Glu Asp Ile Gly Arg Ala Leu Gly Leu 325 330335 Pro Pro His Gly Ala Gln Pro Ala Pro Thr Leu Val Asn Val Met Ala 340345 350 Phe Ala Pro Val Leu Asp Phe Gly Asp Cys Leu Ser Pro Val His Gln355 360 365 Leu Ser Ala Gly Pro Val Glu Asp Leu Ala Val Asn Leu Leu GlyThr 370 375 380 Pro Gly Asp Gly Arg Glu Leu Glu Ile Thr Val Ala Ala AsnPro Leu 385 390 395 400 Leu His Ser Glu Asp Ala Val Ala Ser Leu Ala AlaArg Leu Ala Glu 405 410 415 Phe Leu Ala Arg Ala Gly Glu His Ala Asp AlaPro Ile Gly Arg Thr 420 425 430 Arg Leu Leu Gly Ala Ala 435 15 1296 DNAStreptomyces aizunensis 15 ggtgggctcc gggaaatgat ggcgggccag ctcgcgatctggtacagcca tcagttggcg 60 cccgagaacc cgtgcttcaa cggtgccgag tacctggcgcttgacggaga cgtggatctg 120 ggcctcctgg tgaaggcctc gcagcggctg atggaagaggcggacgccgc ccggctgcgg 180 atccgtgaag tggacgggca gccgaggcag tacttccacgacgtggagga ctaccccgtc 240 gaggtcatcg acatcagctc cgaggccgat ccccaggcggcggccgagag cctgatgtgg 300 gaagacctgc ggggcgagcg gggagcggcc gaccgctccctctacaccat caagatctac 360 acggccggtc cccggctcac cttctggtac cagcgggcctaccacgtgat cctggacggc 420 cgcagcgcgg gcctggtggt cggccgcctg tcgcaggtgtacaacaccct gctccagggc 480 ggttccgtgg aagagggcgc cctgccctcc agcaccgtcctgatggacgc ggaacgcgag 540 taccggacct ccgaggccca cgaggccgac cgggagtactggcgcggcgt cctcgcgggc 600 ctccccgagg ccgagggcct cggcagcaac tacggcggccgcgcccagcg cgcccccatc 660 cggttcgtgg agagcgtcgg cgacgccgtc gccacggacctgaagacggc cgcccgcggg 720 ctggggacga acttcgccgg cctgatgatc agcgccgccgccctctacca gcaccacctc 780 accggacagc aggacgtggt cgtcggcgtc ccggtcagcggccgctccgg aacgcgcgac 840 ctcgccattc cgttcatgac caacaacgtc cttccgatccgggtgacgat cgccccggac 900 acctcggtcg ccgacctcgt gcggcagacc acgcgcgccgtgatgaaggg cctgcgccac 960 cagcgctacc gctacgagca catgctcaac gacgcgatgctcggcgaggg cggtctctgg 1020 gacctgctca tcaacgtgat gtccttcgac atctacgccctccccttcgg cgactgcacc 1080 gtcaccgcgc acaacctctc cagcggcccc gtcgacagcacgcgcatcga cgtgtacgac 1140 cgctccggcc tgaagatcgc cgtcgacgtc aaccccgacgcccccgacct gtcgccgggc 1200 gacgaggtct gccgtcgctt cctggcgctc gcgcactggctcgtctcggt cgatcccgcc 1260 gaaccggtcg gccgctccgg cctgctggac gcggac 129616 432 PRT Streptomyces aizunensis 16 Gly Gly Leu Arg Glu Met Met AlaGly Gln Leu Ala Ile Trp Tyr Ser 1 5 10 15 His Gln Leu Ala Pro Glu AsnPro Cys Phe Asn Gly Ala Glu Tyr Leu 20 25 30 Ala Leu Asp Gly Asp Val AspLeu Gly Leu Leu Val Lys Ala Ser Gln 35 40 45 Arg Leu Met Glu Glu Ala AspAla Ala Arg Leu Arg Ile Arg Glu Val 50 55 60 Asp Gly Gln Pro Arg Gln TyrPhe His Asp Val Glu Asp Tyr Pro Val 65 70 75 80 Glu Val Ile Asp Ile SerSer Glu Ala Asp Pro Gln Ala Ala Ala Glu 85 90 95 Ser Leu Met Trp Glu AspLeu Arg Gly Glu Arg Gly Ala Ala Asp Arg 100 105 110 Ser Leu Tyr Thr IleLys Ile Tyr Thr Ala Gly Pro Arg Leu Thr Phe 115 120 125 Trp Tyr Gln ArgAla Tyr His Val Ile Leu Asp Gly Arg Ser Ala Gly 130 135 140 Leu Val ValGly Arg Leu Ser Gln Val Tyr Asn Thr Leu Leu Gln Gly 145 150 155 160 GlySer Val Glu Glu Gly Ala Leu Pro Ser Ser Thr Val Leu Met Asp 165 170 175Ala Glu Arg Glu Tyr Arg Thr Ser Glu Ala His Glu Ala Asp Arg Glu 180 185190 Tyr Trp Arg Gly Val Leu Ala Gly Leu Pro Glu Ala Glu Gly Leu Gly 195200 205 Ser Asn Tyr Gly Gly Arg Ala Gln Arg Ala Pro Ile Arg Phe Val Glu210 215 220 Ser Val Gly Asp Ala Val Ala Thr Asp Leu Lys Thr Ala Ala ArgGly 225 230 235 240 Leu Gly Thr Asn Phe Ala Gly Leu Met Ile Ser Ala AlaAla Leu Tyr 245 250 255 Gln His His Leu Thr Gly Gln Gln Asp Val Val ValGly Val Pro Val 260 265 270 Ser Gly Arg Ser Gly Thr Arg Asp Leu Ala IlePro Phe Met Thr Asn 275 280 285 Asn Val Leu Pro Ile Arg Val Thr Ile AlaPro Asp Thr Ser Val Ala 290 295 300 Asp Leu Val Arg Gln Thr Thr Arg AlaVal Met Lys Gly Leu Arg His 305 310 315 320 Gln Arg Tyr Arg Tyr Glu HisMet Leu Asn Asp Ala Met Leu Gly Glu 325 330 335 Gly Gly Leu Trp Asp LeuLeu Ile Asn Val Met Ser Phe Asp Ile Tyr 340 345 350 Ala Leu Pro Phe GlyAsp Cys Thr Val Thr Ala His Asn Leu Ser Ser 355 360 365 Gly Pro Val AspSer Thr Arg Ile Asp Val Tyr Asp Arg Ser Gly Leu 370 375 380 Lys Ile AlaVal Asp Val Asn Pro Asp Ala Pro Asp Leu Ser Pro Gly 385 390 395 400 AspGlu Val Cys Arg Arg Phe Leu Ala Leu Ala His Trp Leu Val Ser 405 410 415Val Asp Pro Ala Glu Pro Val Gly Arg Ser Gly Leu Leu Asp Ala Asp 420 425430 17 1293 DNA Actinomycete 17 gtggatcgtc gtcccgtctc cgccgcccagctgggcatct gggtcgcgca gcaggtgctg 60 ccggacagtc ctctgtacaa ctgcggctgctactacgaga tcggcgcggc cgatcccggg 120 ctgctcgacc gcgcggtccg gcacacgctggccgagaccg aggcgctgcg gtcgcgcttc 180 gagacgatcg acgaccagct gtggcagctcgtcgggccgg ccgaccccga gccgctggag 240 gtcgtcgacc tgcgcgcgga gcccgacccggaggcggccg cccggcgctg gatgggggcc 300 gcgatggccg aggtccggcc cctgggccgggccccgctga gccgccaggc cgtgctgctg 360 ctcggcgcgg accgccggct gtggttccacggctaccacc acgccgtgct ggacggcttc 420 ggccagtccg tctacgccgc ccgggtggcgcaggtctacg ccgccctggc cgccggccgg 480 accccgccgg agcgggtctt cgcgacgctggacgaggcgc acgccgacgc cgcggtggac 540 cccgcgtccc gccggttcgc cgccgaccgggactactggc tcggcgcctt cgccgaccgg 600 ccggagccgg tcgggctggc cggccgggccggcgccgccg ggccgaccca gctgcgccgg 660 atccgcccgc tgccgccggg gtgcgccgcccggttcgcgg cggcggccga ggcggtgggc 720 agcacctggc cggccgccgt gatcgccgcggtggccgcct actaccaccg gatgaccggg 780 cgcgaggaga tcgtcttcgc gctgccgctggccggccgcc gcggccgggc ctcgctgagc 840 acgcccgggg ccctggtcaa cgtgctgccgatccggctct cggtgagctc ccgggccacc 900 ttcgccgagc tggcccggca ggccggccgccggctggccg acgtgctgcg ccatcagcgc 960 ttccgcggcg agcagctgtt ccaggagctgggcctgtccg gcgagcgcgc gttctggggg 1020 cccacggtca acgtgatggg cttcggcggcgacctggccc tgggcccggt caccgcggtg 1080 ccgcacccgc tcgcgaccgg cccggtccaggacctcaaga tcaacttcta cggtacgccg 1140 gccacgggcg tccgcctcga actcgacgccgacccggccc gctacgacgc ggtggccgtc 1200 gcggagcacc aggaccggct gatccggctgctgacggcgc tcgggcacga cccggcgacc 1260 cggatcggcg ccgtcgacct gctcgaccccgcc 1293 18 431 PRT Actinomycete 18 Val Asp Arg Arg Pro Val Ser Ala AlaGln Leu Gly Ile Trp Val Ala 1 5 10 15 Gln Gln Val Leu Pro Asp Ser ProLeu Tyr Asn Cys Gly Cys Tyr Tyr 20 25 30 Glu Ile Gly Ala Ala Asp Pro GlyLeu Leu Asp Arg Ala Val Arg His 35 40 45 Thr Leu Ala Glu Thr Glu Ala LeuArg Ser Arg Phe Glu Thr Ile Asp 50 55 60 Asp Gln Leu Trp Gln Leu Val GlyPro Ala Asp Pro Glu Pro Leu Glu 65 70 75 80 Val Val Asp Leu Arg Ala GluPro Asp Pro Glu Ala Ala Ala Arg Arg 85 90 95 Trp Met Gly Ala Ala Met AlaGlu Val Arg Pro Leu Gly Arg Ala Pro 100 105 110 Leu Ser Arg Gln Ala ValLeu Leu Leu Gly Ala Asp Arg Arg Leu Trp 115 120 125 Phe His Gly Tyr HisHis Ala Val Leu Asp Gly Phe Gly Gln Ser Val 130 135 140 Tyr Ala Ala ArgVal Ala Gln Val Tyr Ala Ala Leu Ala Ala Gly Arg 145 150 155 160 Thr ProPro Glu Arg Val Phe Ala Thr Leu Asp Glu Ala His Ala Asp 165 170 175 AlaAla Val Asp Pro Ala Ser Arg Arg Phe Ala Ala Asp Arg Asp Tyr 180 185 190Trp Leu Gly Ala Phe Ala Asp Arg Pro Glu Pro Val Gly Leu Ala Gly 195 200205 Arg Ala Gly Ala Ala Gly Pro Thr Gln Leu Arg Arg Ile Arg Pro Leu 210215 220 Pro Pro Gly Cys Ala Ala Arg Phe Ala Ala Ala Ala Glu Ala Val Gly225 230 235 240 Ser Thr Trp Pro Ala Ala Val Ile Ala Ala Val Ala Ala TyrTyr His 245 250 255 Arg Met Thr Gly Arg Glu Glu Ile Val Phe Ala Leu ProLeu Ala Gly 260 265 270 Arg Arg Gly Arg Ala Ser Leu Ser Thr Pro Gly AlaLeu Val Asn Val 275 280 285 Leu Pro Ile Arg Leu Ser Val Ser Ser Arg AlaThr Phe Ala Glu Leu 290 295 300 Ala Arg Gln Ala Gly Arg Arg Leu Ala AspVal Leu Arg His Gln Arg 305 310 315 320 Phe Arg Gly Glu Gln Leu Phe GlnGlu Leu Gly Leu Ser Gly Glu Arg 325 330 335 Ala Phe Trp Gly Pro Thr ValAsn Val Met Gly Phe Gly Gly Asp Leu 340 345 350 Ala Leu Gly Pro Val ThrAla Val Pro His Pro Leu Ala Thr Gly Pro 355 360 365 Val Gln Asp Leu LysIle Asn Phe Tyr Gly Thr Pro Ala Thr Gly Val 370 375 380 Arg Leu Glu LeuAsp Ala Asp Pro Ala Arg Tyr Asp Ala Val Ala Val 385 390 395 400 Ala GluHis Gln Asp Arg Leu Ile Arg Leu Leu Thr Ala Leu Gly His 405 410 415 AspPro Ala Thr Arg Ile Gly Ala Val Asp Leu Leu Asp Pro Ala 420 425 430 191305 DNA Streptomyces sp. (Ecopia strain) 19 ggcggccgtc gtgagctgatggccggacag cttggcttat ggcatgcgca gcaactgaat 60 ccggataatc cgatctataacatgggtgaa tacatagaga ttcgcggaaa ggtggacacg 120 agcttattcg aggcggctgtgcgaagggtc gtcctggaag tcgacggttt tggtctgcgc 180 tttgagggag gtgctgacgaagttccgcgg caatattttg gcctgcggag cgattggctg 240 tttcatgtga tcgacgtgagcggcgaggag gacccccgtt ccagcgcgga gagttggatg 300 cgggcggaca tgcgacgcccggtggatctc caggtcggtg aactcttcac ccaggccatc 360 atcaaggtgg acgaagatctcttcttctgg tatcagcgaa tacaccacat catcgcggac 420 ggactcgcgg gaccccggatagcctcccga gtggctgcgg tctacacggc actgtcggcc 480 ggcgaacccc tcgcggacagtgcgcctccc tcgagttccg tactgatgga cgccgatgcc 540 gactaccggg cgtccccagaattcgaactg gaccggcagt actggacgga gcgtctttcc 600 gatcgccctc aaacggtcagcttgagcggc caggaacctt ccaccacacc ccatgaactg 660 acacggcata cgctccacatcccacccgac gccgctgcgg aactcagaag ctccgcccgt 720 cggctgggaa cgagcctctcgggtctggcc gtagccgcga gcgccgccta cctgcatcgc 780 gcaacgggac aagaggacatcattctcggg gtccccgtaa tgggcagaaa aaccgcgctg 840 cgggacatcc ccggaatgacggcgaatatc gttcctctgc gcctcgctgt gcagccgaag 900 gccacggtga gggagctcgtgaagcaggta tctcgcggag tacgagacgc cttgcggcat 960 cagcgatacc gctacgaggacatcctcaga gacctgaagc tcgtggggcg cgacggactc 1020 taccccctac tggtcaatatcgtctccttt gactacgatt tgagatttgg tgacgccccc 1080 agcattgcgc acgggctcggcggcataaac ttcaacgacc tgtcgatttc cgtgtacgac 1140 aggtcgtccg acggaagcatgtccgtggtt gtggacgcca atcccgacct ttacagccgt 1200 ggagcggtgc aagagcatgccgcgaaattc ctcgacgtga tgaactggat ggcgcgttcc 1260 gctgcggagg aacgcatccaccagatcacg ttgatgagcc gctcc 1305 20 435 PRT Streptomyces sp. (Ecopiastrain) 20 Gly Gly Arg Arg Glu Leu Met Ala Gly Gln Leu Gly Leu Trp HisAla 1 5 10 15 Gln Gln Leu Asn Pro Asp Asn Pro Ile Tyr Asn Met Gly GluTyr Ile 20 25 30 Glu Ile Arg Gly Lys Val Asp Thr Ser Leu Phe Glu Ala AlaVal Arg 35 40 45 Arg Val Val Leu Glu Val Asp Gly Phe Gly Leu Arg Phe GluGly Gly 50 55 60 Ala Asp Glu Val Pro Arg Gln Tyr Phe Gly Leu Arg Ser AspTrp Leu 65 70 75 80 Phe His Val Ile Asp Val Ser Gly Glu Glu Asp Pro ArgSer Ser Ala 85 90 95 Glu Ser Trp Met Arg Ala Asp Met Arg Arg Pro Val AspLeu Gln Val 100 105 110 Gly Glu Leu Phe Thr Gln Ala Ile Ile Lys Val AspGlu Asp Leu Phe 115 120 125 Phe Trp Tyr Gln Arg Ile His His Ile Ile AlaAsp Gly Leu Ala Gly 130 135 140 Pro Arg Ile Ala Ser Arg Val Ala Ala ValTyr Thr Ala Leu Ser Ala 145 150 155 160 Gly Glu Pro Leu Ala Asp Ser AlaPro Pro Ser Ser Ser Val Leu Met 165 170 175 Asp Ala Asp Ala Asp Tyr ArgAla Ser Pro Glu Phe Glu Leu Asp Arg 180 185 190 Gln Tyr Trp Thr Glu ArgLeu Ser Asp Arg Pro Gln Thr Val Ser Leu 195 200 205 Ser Gly Gln Glu ProSer Thr Thr Pro His Glu Leu Thr Arg His Thr 210 215 220 Leu His Ile ProPro Asp Ala Ala Ala Glu Leu Arg Ser Ser Ala Arg 225 230 235 240 Arg LeuGly Thr Ser Leu Ser Gly Leu Ala Val Ala Ala Ser Ala Ala 245 250 255 TyrLeu His Arg Ala Thr Gly Gln Glu Asp Ile Ile Leu Gly Val Pro 260 265 270Val Met Gly Arg Lys Thr Ala Leu Arg Asp Ile Pro Gly Met Thr Ala 275 280285 Asn Ile Val Pro Leu Arg Leu Ala Val Gln Pro Lys Ala Thr Val Arg 290295 300 Glu Leu Val Lys Gln Val Ser Arg Gly Val Arg Asp Ala Leu Arg His305 310 315 320 Gln Arg Tyr Arg Tyr Glu Asp Ile Leu Arg Asp Leu Lys LeuVal Gly 325 330 335 Arg Asp Gly Leu Tyr Pro Leu Leu Val Asn Ile Val SerPhe Asp Tyr 340 345 350 Asp Leu Arg Phe Gly Asp Ala Pro Ser Ile Ala HisGly Leu Gly Gly 355 360 365 Ile Asn Phe Asn Asp Leu Ser Ile Ser Val TyrAsp Arg Ser Ser Asp 370 375 380 Gly Ser Met Ser Val Val Val Asp Ala AsnPro Asp Leu Tyr Ser Arg 385 390 395 400 Gly Ala Val Gln Glu His Ala AlaLys Phe Leu Asp Val Met Asn Trp 405 410 415 Met Ala Arg Ser Ala Ala GluGlu Arg Ile His Gln Ile Thr Leu Met 420 425 430 Ser Arg Ser 435 21 1338DNA Streptomyces coelicolor 21 tcggttcggc acggtctgac gagcgcgcagcacgaggtgt ggctcgccca gcagctggat 60 ccgcgtggcg cgcactaccg gacgggatcctgcctggaga tcgacggacc cctggaccac 120 gcggtgctga gccgcgccct gcggctcaccgtggccggta cggagacgct ctgctcgcgc 180 ttcctcaccg acgaggaggg ccggccgtaccgcgcgtact gcccgcccgc gccggagggt 240 tcggccgccg tcgaggaccc ggacggggtgccgtacaccc ccgtgctgct gcgccacatc 300 gacctctccg gtcacgagga ccccgagggcgaggcccagc ggtggatgga ccgggaccgc 360 gcgacgccgc tgccgctgga ccggcccggcctgagcagcc acgcgctgtt cacgctcggc 420 gggggccggc acctgtacta cctgggcgtccaccacatcg tgatcgacgg caccagcatg 480 gccctgttct acgagcggct ggccgaggtgtaccgcgcgc tgcgggacgg gcgtgcggtg 540 cccgcggccg ccttcgggga cacggaccggatggtcgcgg gcgaggaggc ctaccgcgcg 600 tcggcgcggt acgagcgtga ccgggcctactggaccggcc tgttcaccga ccgccccgag 660 cccgtctcgc tcaccgggcg cggcggcggccgggccctcg cgccgaccgt gaggagcctg 720 ggcctgcccc cggagcgcac ggaggtgctcggccgggccg ccgaggcgac cggtgcgcac 780 tgggcgcgcg tggtcatcgc cggtgtggccgccttcctgc accggacgac gggcgcccgg 840 gacgtcgtgg tgtcggtgcc ggtcaccgggcgctacggcg cgaacgcccg gatcaccccc 900 ggcatggtct ccaaccggct gccgctgcggctggcggtgc gccccggcga gagtttcgcg 960 cgggtggtcg agaccgtgtc cgaggcgatgagcggcctcc tggcgcacag ccgcttccgc 1020 ggcgaggacc tcgaccggga gctgggcggcgcgggggtgt cggggcccac cgtcaacgtc 1080 atgccgtaca tcaggccggt ggacttcggcggtccggtcg gcctgatgcg cagcatcagt 1140 tcgggtccga ccaccgatct gaacatcgtgctgaccggca cccccgagtc cggcctgcgc 1200 gtcgacttcg agggcaaccc gcaggtgtacggcggccagg acctgacggt gctgcaggaa 1260 cgcttcgtcc ggttcctggc ggagctggcggccgaccccg cagccaccgt cgacgaggtc 1320 gcgctgctga cgccggac 1338 22 446PRT Streptomyces coelicolor 22 Ser Val Arg His Gly Leu Thr Ser Ala GlnHis Glu Val Trp Leu Ala 1 5 10 15 Gln Gln Leu Asp Pro Arg Gly Ala HisTyr Arg Thr Gly Ser Cys Leu 20 25 30 Glu Ile Asp Gly Pro Leu Asp His AlaVal Leu Ser Arg Ala Leu Arg 35 40 45 Leu Thr Val Ala Gly Thr Glu Thr LeuCys Ser Arg Phe Leu Thr Asp 50 55 60 Glu Glu Gly Arg Pro Tyr Arg Ala TyrCys Pro Pro Ala Pro Glu Gly 65 70 75 80 Ser Ala Ala Val Glu Asp Pro AspGly Val Pro Tyr Thr Pro Val Leu 85 90 95 Leu Arg His Ile Asp Leu Ser GlyHis Glu Asp Pro Glu Gly Glu Ala 100 105 110 Gln Arg Trp Met Asp Arg AspArg Ala Thr Pro Leu Pro Leu Asp Arg 115 120 125 Pro Gly Leu Ser Ser HisAla Leu Phe Thr Leu Gly Gly Gly Arg His 130 135 140 Leu Tyr Tyr Leu GlyVal His His Ile Val Ile Asp Gly Thr Ser Met 145 150 155 160 Ala Leu PheTyr Glu Arg Leu Ala Glu Val Tyr Arg Ala Leu Arg Asp 165 170 175 Gly ArgAla Val Pro Ala Ala Ala Phe Gly Asp Thr Asp Arg Met Val 180 185 190 AlaGly Glu Glu Ala Tyr Arg Ala Ser Ala Arg Tyr Glu Arg Asp Arg 195 200 205Ala Tyr Trp Thr Gly Leu Phe Thr Asp Arg Pro Glu Pro Val Ser Leu 210 215220 Thr Gly Arg Gly Gly Gly Arg Ala Leu Ala Pro Thr Val Arg Ser Leu 225230 235 240 Gly Leu Pro Pro Glu Arg Thr Glu Val Leu Gly Arg Ala Ala GluAla 245 250 255 Thr Gly Ala His Trp Ala Arg Val Val Ile Ala Gly Val AlaAla Phe 260 265 270 Leu His Arg Thr Thr Gly Ala Arg Asp Val Val Val SerVal Pro Val 275 280 285 Thr Gly Arg Tyr Gly Ala Asn Ala Arg Ile Thr ProGly Met Val Ser 290 295 300 Asn Arg Leu Pro Leu Arg Leu Ala Val Arg ProGly Glu Ser Phe Ala 305 310 315 320 Arg Val Val Glu Thr Val Ser Glu AlaMet Ser Gly Leu Leu Ala His 325 330 335 Ser Arg Phe Arg Gly Glu Asp LeuAsp Arg Glu Leu Gly Gly Ala Gly 340 345 350 Val Ser Gly Pro Thr Val AsnVal Met Pro Tyr Ile Arg Pro Val Asp 355 360 365 Phe Gly Gly Pro Val GlyLeu Met Arg Ser Ile Ser Ser Gly Pro Thr 370 375 380 Thr Asp Leu Asn IleVal Leu Thr Gly Thr Pro Glu Ser Gly Leu Arg 385 390 395 400 Val Asp PheGlu Gly Asn Pro Gln Val Tyr Gly Gly Gln Asp Leu Thr 405 410 415 Val LeuGln Glu Arg Phe Val Arg Phe Leu Ala Glu Leu Ala Ala Asp 420 425 430 ProAla Ala Thr Val Asp Glu Val Ala Leu Leu Thr Pro Asp 435 440 445 23 1763DNA Actinoplanes sp. 23 tggtcatcga cgccgccacc caacccaccg ttcccgacgccttccgggcg caggcgatcg 60 cgcgccccgg cgagcccgcc ctcgtggtgc tccccggcgacccggacgcc gagcccgtca 120 ccctcacgta cgccgagctc gaccgccgcg ccgcggcgcgggcggcctgg ctcgccgccc 180 ggttcccggc cggggagcgc atcctcatcg ccctgcccaccggcgccgag ttcgtcgagc 240 tctacctggc gtgcctctac gccggcctgg tcgccgtgccggcgcccccg cccggagggt 300 cgtccggcgc ctccgagcgc accgtcggca tcgcggccgactgctccccc gccctggccg 360 tcgtcaacgc cgacgacgcg gcgccgctca ccgccgtcctgcgcgagcgc ggcctgtccg 420 gcctgccggt cggtgcgctt ccgcccctcg cggcggaagcgatccgcccg ccccgcgggc 480 cccggccgga ctcgctggcc gtcctgcagt acagctcgggctccaccggc tcgcccaagg 540 gcgtgatgct cagccaccgg gccgtgctgg ccaacctccgcgcgttcgac cgcagcagcg 600 ggcacaacag cgacgacgtg ttcggcagct ggctgccgctgcaccacgac atgggcctgt 660 tcgccatgct caccgcgggc ctgctgaacg gcgccggcgtcgtgctgatg tcgccgacgg 720 ccttcgtccg ccggccggcg gactggctgc ggatgatggaccgctaccgg gtcaccatct 780 ccgccgcgcc caacttcgcg tacgacctgt gcgtgcgcgccgtgcgggac gagcagatcg 840 ccggcctcga cctgtcccgc atccgcacgc tctacaacggatcggagccg gtcaacccgg 900 ccaccgtccg ggcgttcacc gagcgcttcg cccccttcggcctgcacacc cacgcggtga 960 acccctgcta cggcatggcc gagttcaccg cgtacgtgtcgacgaaggtc ttcgaggcgc 1020 cggcggtctt tcttcccgcc gaccctcgcg cgctggaggacgccgcgtcg ccggccctgc 1080 gcccggccga ccccgccgcg gcccgggaga taccgggtgtcggccgggtg cccgacttcg 1140 aggtgctcat cgtcgacccg gacgggctac ggccgctgcccgagggccgg gtcggcgaga 1200 tctggctgcg cgggcccggc gcgggcgccg gctactggggcaggaccgag ctcaaccccg 1260 gcatcttcga cgccaggccc gcgggcgacg gccaggacggcggctgggtg cgaacgggtg 1320 acctgggtgc gctgaccgga ggcgagctgt tcctcaccggacgcctcaag gagctgctca 1380 tcgtgcacgg ccgcaacctg gccccgcacg acctcgagcgggaggcccgg gccgcgcacg 1440 acgcggtgga ccaccagatc ggggcggcgt tcggggtgccggcgcccgac gagcggatcg 1500 tgctggtgca ggaggtgcat ccgcgcacgc cgctcgacgagctgccgcgg gtggcgagcg 1560 ccgtcagccg ccggctcacc gtctccttcg gcgtgccggtacgcaacgtg ctgctggtgc 1620 ggcgcggcac ggtgcgccgg accacgagcg gcaagatccgccggaccgcg gtccgcgagc 1680 ggttcctggc cggcggcatc acggcgctgc acgccgagctcgagccggcg ctgcggccgg 1740 tgcaggcggg cgcgggccga tga 1763 24 587 PRTActinoplanes sp. 24 Met Val Ile Asp Ala Ala Thr Gln Pro Thr Val Pro AspAla Phe Arg 1 5 10 15 Ala Gln Ala Ile Ala Arg Pro Gly Glu Pro Ala LeuVal Val Leu Pro 20 25 30 Gly Asp Pro Asp Ala Glu Pro Val Thr Leu Thr TyrAla Glu Leu Asp 35 40 45 Arg Arg Ala Ala Ala Arg Ala Ala Trp Leu Ala AlaArg Phe Pro Ala 50 55 60 Gly Glu Arg Ile Leu Ile Ala Leu Pro Thr Gly AlaGlu Phe Val Glu 65 70 75 80 Leu Tyr Leu Ala Cys Leu Tyr Ala Gly Leu ValAla Val Pro Ala Pro 85 90 95 Pro Pro Gly Gly Ser Ser Gly Ala Ser Glu ArgThr Val Gly Ile Ala 100 105 110 Ala Asp Cys Ser Pro Ala Leu Ala Val ValAsn Ala Asp Asp Ala Ala 115 120 125 Pro Leu Thr Ala Val Leu Arg Glu ArgGly Leu Ser Gly Leu Pro Val 130 135 140 Gly Ala Leu Pro Pro Leu Ala AlaGlu Ala Ile Arg Pro Pro Arg Gly 145 150 155 160 Pro Arg Pro Asp Ser LeuAla Val Leu Gln Tyr Ser Ser Gly Ser Thr 165 170 175 Gly Ser Pro Lys GlyVal Met Leu Ser His Arg Ala Val Leu Ala Asn 180 185 190 Leu Arg Ala PheAsp Arg Ser Ser Gly His Asn Ser Asp Asp Val Phe 195 200 205 Gly Ser TrpLeu Pro Leu His His Asp Met Gly Leu Phe Ala Met Leu 210 215 220 Thr AlaGly Leu Leu Asn Gly Ala Gly Val Val Leu Met Ser Pro Thr 225 230 235 240Ala Phe Val Arg Arg Pro Ala Asp Trp Leu Arg Met Met Asp Arg Tyr 245 250255 Arg Val Thr Ile Ser Ala Ala Pro Asn Phe Ala Tyr Asp Leu Cys Val 260265 270 Arg Ala Val Arg Asp Glu Gln Ile Ala Gly Leu Asp Leu Ser Arg Ile275 280 285 Arg Thr Leu Tyr Asn Gly Ser Glu Pro Val Asn Pro Ala Thr ValArg 290 295 300 Ala Phe Thr Glu Arg Phe Ala Pro Phe Gly Leu His Thr HisAla Val 305 310 315 320 Asn Pro Cys Tyr Gly Met Ala Glu Phe Thr Ala TyrVal Ser Thr Lys 325 330 335 Val Phe Glu Ala Pro Ala Val Phe Leu Pro AlaAsp Pro Arg Ala Leu 340 345 350 Glu Asp Ala Ala Ser Pro Ala Leu Arg ProAla Asp Pro Ala Ala Ala 355 360 365 Arg Glu Ile Pro Gly Val Gly Arg ValPro Asp Phe Glu Val Leu Ile 370 375 380 Val Asp Pro Asp Gly Leu Arg ProLeu Pro Glu Gly Arg Val Gly Glu 385 390 395 400 Ile Trp Leu Arg Gly ProGly Ala Gly Ala Gly Tyr Trp Gly Arg Thr 405 410 415 Glu Leu Asn Pro GlyIle Phe Asp Ala Arg Pro Ala Gly Asp Gly Gln 420 425 430 Asp Gly Gly TrpVal Arg Thr Gly Asp Leu Gly Ala Leu Thr Gly Gly 435 440 445 Glu Leu PheLeu Thr Gly Arg Leu Lys Glu Leu Leu Ile Val His Gly 450 455 460 Arg AsnLeu Ala Pro His Asp Leu Glu Arg Glu Ala Arg Ala Ala His 465 470 475 480Asp Ala Val Asp His Gln Ile Gly Ala Ala Phe Gly Val Pro Ala Pro 485 490495 Asp Glu Arg Ile Val Leu Val Gln Glu Val His Pro Arg Thr Pro Leu 500505 510 Asp Glu Leu Pro Arg Val Ala Ser Ala Val Ser Arg Arg Leu Thr Val515 520 525 Ser Phe Gly Val Pro Val Arg Asn Val Leu Leu Val Arg Arg GlyThr 530 535 540 Val Arg Arg Thr Thr Ser Gly Lys Ile Arg Arg Thr Ala ValArg Glu 545 550 555 560 Arg Phe Leu Ala Gly Gly Ile Thr Ala Leu His AlaGlu Leu Glu Pro 565 570 575 Ala Leu Arg Pro Val Gln Ala Gly Ala Gly Arg580 585 25 1803 DNA Streptomyces roseosporus 25 gtgcctgccg tgagtgagagccgctgtgcc gggcagggcc tggtgggggc actgcggacc 60 tgggcacgga cacgtgcccgggagactgcc gtggttctcg tacgggacac cggaaccacc 120 gacgacacgg cgtcggtggactacggacag ctggacgagt gggccagaag catcgcggtg 180 accctccgac agcaactcgcgccgggggga cgggcacttc tgctgctgcc gtccggcccg 240 gagttcacgg ccgcgtacctcggctgcctg tacgcgggtc tggccgccgt accggcgccg 300 ctgcccgggg ggcgccacttcgaacgccgc cgtgtcgcgg ccatcgccgc cgacagcgga 360 gccggcgtgg tgctgaccgtcgcgggtgag accgcctccg tccacgactg gctgaccgag 420 accacggccc cggctactcgcgtcgtggcc gtggacgacc gggcggcgct cggcgacccg 480 gcgcagtggg acgacccgggcgtcgcgccc gacgacgtgg ctctcatcca gtacacctcg 540 ggctcgaccg gcaaccccaagggcgtggtc gtgacccacg ccaacctgct ggcgaacgcg 600 cggaatctcg ccgaggcctgcgagctgacc gccgccactc ccatgggcgg ctggctgccc 660 atgtaccacg acatggggctcctgggcacg ctgacaccgg ccctgtacct cggcaccacg 720 tgcgtgctga tgagctccacggcattcatc aaacggccgc acctgtggct acggaccatc 780 gaccggttcg gcctggtctggtcgtcggct cccgacttcg cgtacgacat gtgtctgaag 840 cgcgtcaccg acgagcagatcgccgggctg gacctgtccc gctggcggtg ggccggcaac 900 ggcgcggagc ccatccgggcagccaccgta cgggccttcg gcgaacggtt cgcccggtac 960 ggcctgcgcc ccgaggcgctcaccgccggc tacgggctgg ccgaggccac cctgttcgtg 1020 tcgaggtcgc aggggctgcacacggcacga gtcgccaccg ccgccctcga acgccacgaa 1080 ttccgcctcg ccgtacccggcgaggcagcc cgggagatcg tcagctgcgg tcccgtcggc 1140 cacttccgcg cccgcatcgtcgaacccggc gggcaccgtg ttctgccgcc cggccaggtc 1200 ggcgagctgg tcctccagggagccgccgtc tgcgccggct actggcaggc caaggaggag 1260 accgagcaga ccttcggcctcaccctcgac ggcgaggacg gtcactggct gcgcaccggc 1320 gatctcgccg ccctgcacgaagggaatctc cacatcaccg gccgctgcaa agaggccctg 1380 gtgatacgag gacgcaatctgtacccgcag gacatcgagc acgaactccg cctgcaacac 1440 ccggaacttg agagcgtcggcgccgcgttc accgtcccgg cggcacctgg cacgccgggc 1500 ttgatggtgg tccacgaagtccgcaccccg gtccccgccg acgaccaccc ggccctggtc 1560 agcgccctgc gggggacgatcaaccgcgaa ttcggactcg acgcccaggg catcgccctg 1620 gtgagccgcg gcaccgtactgcgtaccacc agcggcaagg tccgccgggg cgccatgcgt 1680 gacctctgcc tccgcggggagctgaacatc gtccacgcgg acaagggctg gcacgccatc 1740 gccggcacgg ccggagaggacatcgccccc actgaccacg ctccacatcc gcaccccgcg 1800 taa 1803 26 600 PRTStreptomyces roseosporus 26 Val Pro Ala Val Ser Glu Ser Arg Cys Ala GlyGln Gly Leu Val Gly 1 5 10 15 Ala Leu Arg Thr Trp Ala Arg Thr Arg AlaArg Glu Thr Ala Val Val 20 25 30 Leu Val Arg Asp Thr Gly Thr Thr Asp AspThr Ala Ser Val Asp Tyr 35 40 45 Gly Gln Leu Asp Glu Trp Ala Arg Ser IleAla Val Thr Leu Arg Gln 50 55 60 Gln Leu Ala Pro Gly Gly Arg Ala Leu LeuLeu Leu Pro Ser Gly Pro 65 70 75 80 Glu Phe Thr Ala Ala Tyr Leu Gly CysLeu Tyr Ala Gly Leu Ala Ala 85 90 95 Val Pro Ala Pro Leu Pro Gly Gly ArgHis Phe Glu Arg Arg Arg Val 100 105 110 Ala Ala Ile Ala Ala Asp Ser GlyAla Gly Val Val Leu Thr Val Ala 115 120 125 Gly Glu Thr Ala Ser Val HisAsp Trp Leu Thr Glu Thr Thr Ala Pro 130 135 140 Ala Thr Arg Val Val AlaVal Asp Asp Arg Ala Ala Leu Gly Asp Pro 145 150 155 160 Ala Gln Trp AspAsp Pro Gly Val Ala Pro Asp Asp Val Ala Leu Ile 165 170 175 Gln Tyr ThrSer Gly Ser Thr Gly Asn Pro Lys Gly Val Val Val Thr 180 185 190 His AlaAsn Leu Leu Ala Asn Ala Arg Asn Leu Ala Glu Ala Cys Glu 195 200 205 LeuThr Ala Ala Thr Pro Met Gly Gly Trp Leu Pro Met Tyr His Asp 210 215 220Met Gly Leu Leu Gly Thr Leu Thr Pro Ala Leu Tyr Leu Gly Thr Thr 225 230235 240 Cys Val Leu Met Ser Ser Thr Ala Phe Ile Lys Arg Pro His Leu Trp245 250 255 Leu Arg Thr Ile Asp Arg Phe Gly Leu Val Trp Ser Ser Ala ProAsp 260 265 270 Phe Ala Tyr Asp Met Cys Leu Lys Arg Val Thr Asp Glu GlnIle Ala 275 280 285 Gly Leu Asp Leu Ser Arg Trp Arg Trp Ala Gly Asn GlyAla Glu Pro 290 295 300 Ile Arg Ala Ala Thr Val Arg Ala Phe Gly Glu ArgPhe Ala Arg Tyr 305 310 315 320 Gly Leu Arg Pro Glu Ala Leu Thr Ala GlyTyr Gly Leu Ala Glu Ala 325 330 335 Thr Leu Phe Val Ser Arg Ser Gln GlyLeu His Thr Ala Arg Val Ala 340 345 350 Thr Ala Ala Leu Glu Arg His GluPhe Arg Leu Ala Val Pro Gly Glu 355 360 365 Ala Ala Arg Glu Ile Val SerCys Gly Pro Val Gly His Phe Arg Ala 370 375 380 Arg Ile Val Glu Pro GlyGly His Arg Val Leu Pro Pro Gly Gln Val 385 390 395 400 Gly Glu Leu ValLeu Gln Gly Ala Ala Val Cys Ala Gly Tyr Trp Gln 405 410 415 Ala Lys GluGlu Thr Glu Gln Thr Phe Gly Leu Thr Leu Asp Gly Glu 420 425 430 Asp GlyHis Trp Leu Arg Thr Gly Asp Leu Ala Ala Leu His Glu Gly 435 440 445 AsnLeu His Ile Thr Gly Arg Cys Lys Glu Ala Leu Val Ile Arg Gly 450 455 460Arg Asn Leu Tyr Pro Gln Asp Ile Glu His Glu Leu Arg Leu Gln His 465 470475 480 Pro Glu Leu Glu Ser Val Gly Ala Ala Phe Thr Val Pro Ala Ala Pro485 490 495 Gly Thr Pro Gly Leu Met Val Val His Glu Val Arg Thr Pro ValPro 500 505 510 Ala Asp Asp His Pro Ala Leu Val Ser Ala Leu Arg Gly ThrIle Asn 515 520 525 Arg Glu Phe Gly Leu Asp Ala Gln Gly Ile Ala Leu ValSer Arg Gly 530 535 540 Thr Val Leu Arg Thr Thr Ser Gly Lys Val Arg ArgGly Ala Met Arg 545 550 555 560 Asp Leu Cys Leu Arg Gly Glu Leu Asn IleVal His Ala Asp Lys Gly 565 570 575 Trp His Ala Ile Ala Gly Thr Ala GlyGlu Asp Ile Ala Pro Thr Asp 580 585 590 His Ala Pro His Pro His Pro Ala595 600 27 1785 DNA Streptomyces ghanaensis 27 atggtcaacg tcagtgaagcgcggagtgtc cccgaactcc tgcggcatca cgcgagttcg 60 gcacccgacc gggaggcgctgcgctacctg cgcgacacca cggggacgga cgggaccccg 120 ctcacctacc gggaagtggaccgcgctgcc gccgccgtgg cacggcgcct ctcccggagc 180 ttcgaggcgg gcgaccggctgctgctcctg cactccttcg gcccggactt catcgtgggc 240 ttcctcgcct gcctctacgccggcatggtg gccgttcccg cgccgctgcc cggcagatac 300 cgccatgaac gcagacgggtgctgagcatc gcccacgaca gcggcgccgt cgcggtgctc 360 accgacgacg cgagctccgcggaggtcggc gagtggatgc gcgaggaggg cctggacggc 420 ctccccctca tcgccaccgactggtccgcc gaggagcccg gtgccttcac cccggccgcg 480 gacctcggac gcgagacgctcgccatgctc cagtacacct ccggctccac gggcgagccg 540 aagggcgtga tggtgacgcacgggaacctg ctgcggaacg tcaccgcgct gagccgtgcg 600 ttcggcctcg acgagcacacccacttcggc ggctggatcc cccatttcca cgacatgggc 660 ctgatcgggc tgctgctgccctccctcttc ctgcgcagca ggtgcgtgct gatgagcccg 720 tccgccttca tccgccgaccgcacacctgg ctgaagatga tcgacgactt cgacgtcgcg 780 tggtcggcgg cccccgatttcgcctacgaa ctgtgctgcc gacgagtcac cgacgagcag 840 ctcggcagcc tcgacctctcgcgctggcgc tacgcgggca acggctcgga acccatccac 900 gccggcacca tcaccgccttcgccgagcgg ttcgccgccg ccgggttccg cgccgagtcg 960 ctgtccccgt gctacggcctcgccgagtcg acggtctacg tctccggcgg tccctccgcc 1020 cggatcaccg cggtcgacgcccagtcgctg gaggaccacc ggctcggcga ggccgtaccg 1080 ggacggccgc accgctcgctggtgagctgc ggcgcgccgg cggacgtcga cctccggatc 1140 gtcgacccgc ggaccggggacccgctgccg gacggcgcgg tcggggagat ctggctgcgg 1200 ggaggcagcg tcgccgtcggctactgggac aaccccgcgg cgtccgccga gaccttcggc 1260 gccgtcatcg acggggtggagggccgctat ctgcggaccg gtgacctcgg cgcgctgtac 1320 gacggggagc tgtacgtcacgggccggatc aaggaaatga tcaccgtgca cgggaggaac 1380 gtctacccac aggacgtcgagcaggaactg cgcgccgccc accaggaact cgccggctgc 1440 gtcggcgccg tcttcgccctctcggatacg ggccccgacc cggtcctggt cgtgtcccac 1500 gaggtccggg cgggcctcggtgcggacgta ctggaggcgc tggcccggga catgaagcag 1560 acggtggccc gcgagatgggcatgacggcg tcgtgcgtgg tgctcctgcg ccgcgggacg 1620 gtacgccgga cgacgagcggcaagatccag cgcgacgcga tgcggaagct gttccgggac 1680 ggcgagctga agccccttcacacgcactgg cacatgccga ggcagcgcgc cgcggtccac 1740 gggagcagct cggcgcagagcctggccgag gagtccacgg tatga 1785 28 594 PRT Streptomyces ghanaensis 28Met Val Asn Val Ser Glu Ala Arg Ser Val Pro Glu Leu Leu Arg His 1 5 1015 His Ala Ser Ser Ala Pro Asp Arg Glu Ala Leu Arg Tyr Leu Arg Asp 20 2530 Thr Thr Gly Thr Asp Gly Thr Pro Leu Thr Tyr Arg Glu Val Asp Arg 35 4045 Ala Ala Ala Ala Val Ala Arg Arg Leu Ser Arg Ser Phe Glu Ala Gly 50 5560 Asp Arg Leu Leu Leu Leu His Ser Phe Gly Pro Asp Phe Ile Val Gly 65 7075 80 Phe Leu Ala Cys Leu Tyr Ala Gly Met Val Ala Val Pro Ala Pro Leu 8590 95 Pro Gly Arg Tyr Arg His Glu Arg Arg Arg Val Leu Ser Ile Ala His100 105 110 Asp Ser Gly Ala Val Ala Val Leu Thr Asp Asp Ala Ser Ser AlaGlu 115 120 125 Val Gly Glu Trp Met Arg Glu Glu Gly Leu Asp Gly Leu ProLeu Ile 130 135 140 Ala Thr Asp Trp Ser Ala Glu Glu Pro Gly Ala Phe ThrPro Ala Ala 145 150 155 160 Asp Leu Gly Arg Glu Thr Leu Ala Met Leu GlnTyr Thr Ser Gly Ser 165 170 175 Thr Gly Glu Pro Lys Gly Val Met Val ThrHis Gly Asn Leu Leu Arg 180 185 190 Asn Val Thr Ala Leu Ser Arg Ala PheGly Leu Asp Glu His Thr His 195 200 205 Phe Gly Gly Trp Ile Pro His PheHis Asp Met Gly Leu Ile Gly Leu 210 215 220 Leu Leu Pro Ser Leu Phe LeuArg Ser Arg Cys Val Leu Met Ser Pro 225 230 235 240 Ser Ala Phe Ile ArgArg Pro His Thr Trp Leu Lys Met Ile Asp Asp 245 250 255 Phe Asp Val AlaTrp Ser Ala Ala Pro Asp Phe Ala Tyr Glu Leu Cys 260 265 270 Cys Arg ArgVal Thr Asp Glu Gln Leu Gly Ser Leu Asp Leu Ser Arg 275 280 285 Trp ArgTyr Ala Gly Asn Gly Ser Glu Pro Ile His Ala Gly Thr Ile 290 295 300 ThrAla Phe Ala Glu Arg Phe Ala Ala Ala Gly Phe Arg Ala Glu Ser 305 310 315320 Leu Ser Pro Cys Tyr Gly Leu Ala Glu Ser Thr Val Tyr Val Ser Gly 325330 335 Gly Pro Ser Ala Arg Ile Thr Ala Val Asp Ala Gln Ser Leu Glu Asp340 345 350 His Arg Leu Gly Glu Ala Val Pro Gly Arg Pro His Arg Ser LeuVal 355 360 365 Ser Cys Gly Ala Pro Ala Asp Val Asp Leu Arg Ile Val AspPro Arg 370 375 380 Thr Gly Asp Pro Leu Pro Asp Gly Ala Val Gly Glu IleTrp Leu Arg 385 390 395 400 Gly Gly Ser Val Ala Val Gly Tyr Trp Asp AsnPro Ala Ala Ser Ala 405 410 415 Glu Thr Phe Gly Ala Val Ile Asp Gly ValGlu Gly Arg Tyr Leu Arg 420 425 430 Thr Gly Asp Leu Gly Ala Leu Tyr AspGly Glu Leu Tyr Val Thr Gly 435 440 445 Arg Ile Lys Glu Met Ile Thr ValHis Gly Arg Asn Val Tyr Pro Gln 450 455 460 Asp Val Glu Gln Glu Leu ArgAla Ala His Gln Glu Leu Ala Gly Cys 465 470 475 480 Val Gly Ala Val PheAla Leu Ser Asp Thr Gly Pro Asp Pro Val Leu 485 490 495 Val Val Ser HisGlu Val Arg Ala Gly Leu Gly Ala Asp Val Leu Glu 500 505 510 Ala Leu AlaArg Asp Met Lys Gln Thr Val Ala Arg Glu Met Gly Met 515 520 525 Thr AlaSer Cys Val Val Leu Leu Arg Arg Gly Thr Val Arg Arg Thr 530 535 540 ThrSer Gly Lys Ile Gln Arg Asp Ala Met Arg Lys Leu Phe Arg Asp 545 550 555560 Gly Glu Leu Lys Pro Leu His Thr His Trp His Met Pro Arg Gln Arg 565570 575 Ala Ala Val His Gly Ser Ser Ser Ala Gln Ser Leu Ala Glu Glu Ser580 585 590 Thr Val 29 1806 DNA Streptomyces refuineus 29 gtggctcacgtgagcggacc cccagcagac ccgccggccg gctcccacct ggtggccgcg 60 atccgcgcgacggccgaggc cgaccccgag cgcaaggccg tcggcttcgt ccgggatccg 120 gaacgcgaaggtgaggaggc gctgcggagc tactcctggc tcgacgacag ggcccgccgc 180 atcgccgtcctcctccgcgg ggcgcggctc ggcgcgggct cgcgcgtcct gctgctcttc 240 ccgcagtccgcggagttcgc ggcggcctac gccggatgcc tctacggggg gatggtcgcc 300 gtccccgcgcccctgcccac gggaacctcc ctggagaccg cacgcgtcgc cggcatcgcc 360 cgggacgccggggcgggcgc cgtcctcacc gtctccgaca ccgaggcgga ggtccggcgg 420 tgggcggccgagaccggtct gggcgacctg cccctgttct ccgtcgacga actgcccgac 480 gacaccgacccgggggagtg gcgggagccg gagatccggg ccggcaccgt ggcggtgctg 540 cagtacacctccggctccac cggcagcccc aagggggtcg tcgtcaccca cggcgcgctc 600 gccgacaacgtccgcagcct cctgtccggg ttcgacctgg gaaccggcgc ccggctgggc 660 ggctggctgccgatgtacca cgacatgggg ctgttcgggc tgctgagccc ggcgctgttc 720 agcggcggcgccgccgtgct gatgagcggc agcgccttcc tgcgcaggcc gcacctgtgg 780 ccgacgctgatcgaccgctt cggcgtggtc ttctccgcgg cgcccgactt cgcctacgac 840 tactgcgtacggcgggtgga gcccgagcag gtggaccggc tcgacctctc gcgctggcgc 900 tgggcggccaacggctcgga gcccatccgg gccgagacgc tccgcgcctt caccaaggag 960 ttcgcccccgcggggctgcc ccacgacgcg atgaccccct gctacggact ggccgaggcg 1020 accctgctggtctccctgtc ggcgggcgag ctgcgcaccc ggcgggtgga cgccgcggca 1080 ctggagaaccaccgcttcgt cgaggcggcc gcgggccgcc cgtcccgcga ggtcgtctcg 1140 tgcggccggcccccggccct ggaggtccgc gtggccgacc ccgcgaccgg agagcccgtc 1200 acgggcgatgcggtgggcga gatccaggtg cggggcgcga gcgtggccgg cggctactgg 1260 cggaaaccggaggcgaccgc cgagacgttc gtcacggccg cggacggctc cgggccctgg 1320 ctgcgcaccggcgacctcgg cgccctgtac gagggcgagc tgtacgtcac cggccgcatc 1380 aaggaactcctcatcgtgca cggccgcaac atctacccgc acgacgtcga gcgcgaactg 1440 cgcgcccaccacgacgagct cggcgcgatc ggcgccgtct tctccgtccc cacggaggag 1500 ggcgaggccgtcgtggtcac gcacgaggtg gtcccgtccg tccgggacga ccggggcccc 1560 gcgctggtgacggcggtacg ggcgacgctc gcccgggagt tcggcctggc accggccggg 1620 gtggtgctggtgcgccgcgg ccgcaccccg cgcaccagca gcggcaaggt gcagcgccgc 1680 ctggccgcccggctcttccg caccggggaa ctcgcccagg tccacgccga ccccggtgcc 1740 caccggctcgtggcggcgct ccgcgaggcg gacggcctgc gcgacgcccc cgcgtccacg 1800 acatga 180630 601 PRT Streptomyces refuineus 30 Val Ala His Val Ser Gly Pro Pro AlaAsp Pro Pro Ala Gly Ser His 1 5 10 15 Leu Val Ala Ala Ile Arg Ala ThrAla Glu Ala Asp Pro Glu Arg Lys 20 25 30 Ala Val Gly Phe Val Arg Asp ProGlu Arg Glu Gly Glu Glu Ala Leu 35 40 45 Arg Ser Tyr Ser Trp Leu Asp AspArg Ala Arg Arg Ile Ala Val Leu 50 55 60 Leu Arg Gly Ala Arg Leu Gly AlaGly Ser Arg Val Leu Leu Leu Phe 65 70 75 80 Pro Gln Ser Ala Glu Phe AlaAla Ala Tyr Ala Gly Cys Leu Tyr Gly 85 90 95 Gly Met Val Ala Val Pro AlaPro Leu Pro Thr Gly Thr Ser Leu Glu 100 105 110 Thr Ala Arg Val Ala GlyIle Ala Arg Asp Ala Gly Ala Gly Ala Val 115 120 125 Leu Thr Val Ser AspThr Glu Ala Glu Val Arg Arg Trp Ala Ala Glu 130 135 140 Thr Gly Leu GlyAsp Leu Pro Leu Phe Ser Val Asp Glu Leu Pro Asp 145 150 155 160 Asp ThrAsp Pro Gly Glu Trp Arg Glu Pro Glu Ile Arg Ala Gly Thr 165 170 175 ValAla Val Leu Gln Tyr Thr Ser Gly Ser Thr Gly Ser Pro Lys Gly 180 185 190Val Val Val Thr His Gly Ala Leu Ala Asp Asn Val Arg Ser Leu Leu 195 200205 Ser Gly Phe Asp Leu Gly Thr Gly Ala Arg Leu Gly Gly Trp Leu Pro 210215 220 Met Tyr His Asp Met Gly Leu Phe Gly Leu Leu Ser Pro Ala Leu Phe225 230 235 240 Ser Gly Gly Ala Ala Val Leu Met Ser Gly Ser Ala Phe LeuArg Arg 245 250 255 Pro His Leu Trp Pro Thr Leu Ile Asp Arg Phe Gly ValVal Phe Ser 260 265 270 Ala Ala Pro Asp Phe Ala Tyr Asp Tyr Cys Val ArgArg Val Glu Pro 275 280 285 Glu Gln Val Asp Arg Leu Asp Leu Ser Arg TrpArg Trp Ala Ala Asn 290 295 300 Gly Ser Glu Pro Ile Arg Ala Glu Thr LeuArg Ala Phe Thr Lys Glu 305 310 315 320 Phe Ala Pro Ala Gly Leu Pro HisAsp Ala Met Thr Pro Cys Tyr Gly 325 330 335 Leu Ala Glu Ala Thr Leu LeuVal Ser Leu Ser Ala Gly Glu Leu Arg 340 345 350 Thr Arg Arg Val Asp AlaAla Ala Leu Glu Asn His Arg Phe Val Glu 355 360 365 Ala Ala Ala Gly ArgPro Ser Arg Glu Val Val Ser Cys Gly Arg Pro 370 375 380 Pro Ala Leu GluVal Arg Val Ala Asp Pro Ala Thr Gly Glu Pro Val 385 390 395 400 Thr GlyAsp Ala Val Gly Glu Ile Gln Val Arg Gly Ala Ser Val Ala 405 410 415 GlyGly Tyr Trp Arg Lys Pro Glu Ala Thr Ala Glu Thr Phe Val Thr 420 425 430Ala Ala Asp Gly Ser Gly Pro Trp Leu Arg Thr Gly Asp Leu Gly Ala 435 440445 Leu Tyr Glu Gly Glu Leu Tyr Val Thr Gly Arg Ile Lys Glu Leu Leu 450455 460 Ile Val His Gly Arg Asn Ile Tyr Pro His Asp Val Glu Arg Glu Leu465 470 475 480 Arg Ala His His Asp Glu Leu Gly Ala Ile Gly Ala Val PheSer Val 485 490 495 Pro Thr Glu Glu Gly Glu Ala Val Val Val Thr His GluVal Val Pro 500 505 510 Ser Val Arg Asp Asp Arg Gly Pro Ala Leu Val ThrAla Val Arg Ala 515 520 525 Thr Leu Ala Arg Glu Phe Gly Leu Ala Pro AlaGly Val Val Leu Val 530 535 540 Arg Arg Gly Arg Thr Pro Arg Thr Ser SerGly Lys Val Gln Arg Arg 545 550 555 560 Leu Ala Ala Arg Leu Phe Arg ThrGly Glu Leu Ala Gln Val His Ala 565 570 575 Asp Pro Gly Ala His Arg LeuVal Ala Ala Leu Arg Glu Ala Asp Gly 580 585 590 Leu Arg Asp Ala Pro AlaSer Thr Thr 595 600 31 1743 DNA Streptomyces aizunensis 31 atgaccctcgaacccagcgt gctccatctg ctgcgccggc acgccgtcga ccgggcagag 60 cggaccgccgtcaccttcgt ccacgacttc gacgcggccg acggctcgcg gagcctgaac 120 tacgccgaactcgacgcgga ggcacgtcgc gtcgcgtcct ggctccagga gcgctgtgcg 180 cccggagaccgggtgctgct gctgcacccg gccggtctgc ccttcgtcac cgcgttcctc 240 gcctgcctctacgcgggtgt catcgcggtg ccgtctccga tgccgggcca gttccagtac 300 cagcagcgccgcgtgacgac gatcgcccgc gatgccggtg tcagcgtggc gctcaccgac 360 acgggccagctgcccgaggc gcagcagtgg atggccgaca cccgcctcga actgccggtc 420 gccgcgagcgacgcccccgg cttcggtgac gcgtcgcgct ggcgcgaccc cggcgccacc 480 gcccaggacgtggtgctgct gcagtacacc tccggctcga ccggtgaccc caagggcgtc 540 atggtcacgcacgccaacct gctgcacaac gccgacagcc tgagccgttc cctcggcttc 600 accgaggacaccaacttcgg cggctggatc ccgctctacc acgacatggg cctgatgggg 660 cagctgctgccgggtctctt cctgggcagc agcgtcgcgc tgatgtcgcc gatggcgttc 720 ctcaagcgcccgcaccactg gctcgcgctg atcgaccgct acgacatcgg cttctccgcc 780 gcgcccaacttcgcgtacga gctgtgcctg cgccgggtca ccgacgcgca gatcgccgca 840 ctcgacctgtcgcgctggca gttcgccgcc aacggctccg agccgatcca ggccagcacc 900 ctgcgggagttcgcggagcg cttcggcccg gccggcttcc gggccgagca gctcgccccg 960 tgctacggcatggccgaggc gacggtcttc atctccggcc gctcgacccg gccgccgcgg 1020 atccgcgccgtcgacccgca ggcgctggag aagcacgtcg tccaggaccc cgagccgggc 1080 ggcctcgtgcgcgaactcgt cggctgcggc gacgtacccg acctcgacgt gcgcatcgtc 1140 gaagcgggcacgcgcacggt gctgacggac ggcacgaccg gcgagatctg gctgcgcggg 1200 ccgagcgtcgcggccggcta ctggaaccgg ccggaggtga ccgaggagat cttccgcgcc 1260 cacaccgccgacggcgacgg gccctacatg cgcaccggcg acctcggagt gctgctcgac 1320 ggcgaaatctacgtcacggg ccgcaccaag gacctgctga tcgtcaacgg ccgcaacctc 1380 tacccgcacgacctcgaaca cgaactgcgg ctctcccacg ccccgttggc gaccctcgcc 1440 ggtacggcgttcaccgtccc cgccccgcag gaagaggtcg tggtcgtgca cgaggtgcgc 1500 ggccgcttcagccaggaaga gctgcgcgag ctggccatcg gcatgcgcgc caccgtgcac 1560 cgcgagttcggcgtgcacac cgcgggcatc gtgctgatgc ggcccggcac ggtccgcaag 1620 accaccagcggcaaggtgca gcgcgccgag atgcgcggcc tgttcctcgc gggcgccctc 1680 gccccgctgtacgaggagat ggcgcccgga gtccaggcgg cgatggccgg agccgccggg 1740 tga 1743 32580 PRT Streptomyces aizunensis 32 Met Thr Leu Glu Pro Ser Val Leu HisLeu Leu Arg Arg His Ala Val 1 5 10 15 Asp Arg Ala Glu Arg Thr Ala ValThr Phe Val His Asp Phe Asp Ala 20 25 30 Ala Asp Gly Ser Arg Ser Leu AsnTyr Ala Glu Leu Asp Ala Glu Ala 35 40 45 Arg Arg Val Ala Ser Trp Leu GlnGlu Arg Cys Ala Pro Gly Asp Arg 50 55 60 Val Leu Leu Leu His Pro Ala GlyLeu Pro Phe Val Thr Ala Phe Leu 65 70 75 80 Ala Cys Leu Tyr Ala Gly ValIle Ala Val Pro Ser Pro Met Pro Gly 85 90 95 Gln Phe Gln Tyr Gln Gln ArgArg Val Thr Thr Ile Ala Arg Asp Ala 100 105 110 Gly Val Ser Val Ala LeuThr Asp Thr Gly Gln Leu Pro Glu Ala Gln 115 120 125 Gln Trp Met Ala AspThr Arg Leu Glu Leu Pro Val Ala Ala Ser Asp 130 135 140 Ala Pro Gly PheGly Asp Ala Ser Arg Trp Arg Asp Pro Gly Ala Thr 145 150 155 160 Ala GlnAsp Val Val Leu Leu Gln Tyr Thr Ser Gly Ser Thr Gly Asp 165 170 175 ProLys Gly Val Met Val Thr His Ala Asn Leu Leu His Asn Ala Asp 180 185 190Ser Leu Ser Arg Ser Leu Gly Phe Thr Glu Asp Thr Asn Phe Gly Gly 195 200205 Trp Ile Pro Leu Tyr His Asp Met Gly Leu Met Gly Gln Leu Leu Pro 210215 220 Gly Leu Phe Leu Gly Ser Ser Val Ala Leu Met Ser Pro Met Ala Phe225 230 235 240 Leu Lys Arg Pro His His Trp Leu Ala Leu Ile Asp Arg TyrAsp Ile 245 250 255 Gly Phe Ser Ala Ala Pro Asn Phe Ala Tyr Glu Leu CysLeu Arg Arg 260 265 270 Val Thr Asp Ala Gln Ile Ala Ala Leu Asp Leu SerArg Trp Gln Phe 275 280 285 Ala Ala Asn Gly Ser Glu Pro Ile Gln Ala SerThr Leu Arg Glu Phe 290 295 300 Ala Glu Arg Phe Gly Pro Ala Gly Phe ArgAla Glu Gln Leu Ala Pro 305 310 315 320 Cys Tyr Gly Met Ala Glu Ala ThrVal Phe Ile Ser Gly Arg Ser Thr 325 330 335 Arg Pro Pro Arg Ile Arg AlaVal Asp Pro Gln Ala Leu Glu Lys His 340 345 350 Val Val Gln Asp Pro GluPro Gly Gly Leu Val Arg Glu Leu Val Gly 355 360 365 Cys Gly Asp Val ProAsp Leu Asp Val Arg Ile Val Glu Ala Gly Thr 370 375 380 Arg Thr Val LeuThr Asp Gly Thr Thr Gly Glu Ile Trp Leu Arg Gly 385 390 395 400 Pro SerVal Ala Ala Gly Tyr Trp Asn Arg Pro Glu Val Thr Glu Glu 405 410 415 IlePhe Arg Ala His Thr Ala Asp Gly Asp Gly Pro Tyr Met Arg Thr 420 425 430Gly Asp Leu Gly Val Leu Leu Asp Gly Glu Ile Tyr Val Thr Gly Arg 435 440445 Thr Lys Asp Leu Leu Ile Val Asn Gly Arg Asn Leu Tyr Pro His Asp 450455 460 Leu Glu His Glu Leu Arg Leu Ser His Ala Pro Leu Ala Thr Leu Ala465 470 475 480 Gly Thr Ala Phe Thr Val Pro Ala Pro Gln Glu Glu Val ValVal Val 485 490 495 His Glu Val Arg Gly Arg Phe Ser Gln Glu Glu Leu ArgGlu Leu Ala 500 505 510 Ile Gly Met Arg Ala Thr Val His Arg Glu Phe GlyVal His Thr Ala 515 520 525 Gly Ile Val Leu Met Arg Pro Gly Thr Val ArgLys Thr Thr Ser Gly 530 535 540 Lys Val Gln Arg Ala Glu Met Arg Gly LeuPhe Leu Ala Gly Ala Leu 545 550 555 560 Ala Pro Leu Tyr Glu Glu Met AlaPro Gly Val Gln Ala Ala Met Ala 565 570 575 Gly Ala Ala Gly 580 33 1767DNA Actinomycete 33 atggtggagc tgggctcggc cgaaagcatt ccggcggtgctgcgccggca cgcggagaat 60 acgcccgacc gcgccgccca cgcctttgtc accgacctcgacgaggccgg cggcgtcgcc 120 tggctcagcc acgccgagct ggaccgccgg gcccgggccgtggccgcgca gctgtccgcg 180 cacgccgctc ccggcgaccg gatgctgctg ctgcacccggccggcccgga cttcctgatc 240 gcgctgctcg gctgcctgca cgccggtctg atcgcggtgccgtcgccgct gcccggccgc 300 tacgcccatc agcggcgccg ggtccggctg atcgcggccgacgccgacgt gacggccgtg 360 ctgaccgacc gggccacccg cgcggaggtc gtcgagtgggccgccgagca gggactgccc 420 gacatcgcgg tgctgacccc cgacccggag gccgacccgggtgactggca gccgccgccg 480 ctgagccggg acacggtcgc cgtcctgcag tacacctccggctccaccgg caaccccaag 540 ggcgtggtca tcgaccacgg caacatcctg agcaacgccgccacgatcat cgcggtgacc 600 gggatccggc ccggcaccgt gatcggcggg tggctgccgcacttccacga catgggcctg 660 atgggcctgc tgctgccgcc gctgctggcc ggggcgacgacggtgctgag cagccccgtc 720 tcgttcctga agcgcccgct gagctggctg cggatgatcgaccggtacgg cgtcgagatc 780 accgccgccc ccgacttcgc ctacgacctg tgcgtcgccaaggtcaccga cgccgagctg 840 gccacgctcg acctgtcccg ctggcgggtc gccatcaacggctccgagcc ggtccgggcc 900 gccgtgctca cccggttccg gcagcgcttc gccgccgccgggctgcggcc cgaggtgctg 960 accccgagct tcggcatggc cgaggcgacg ctgttcgtctccggcgaccc ggccaccccg 1020 ttcgtcgtcc gccgcgtcga caccgaccgg ctggcccggcaccggttcga gccggccccg 1080 gacggcgggc cgggccgcga cgtggtggcc tgcggcgcgccggccggcgt cgaggtgcgc 1140 atcgtcgacc ccggcagcgg cgacccgttg ccggacggcgccgtcggcga gatctggctg 1200 cgcggcccgt cgatcggccg cggctactgg gggcgcgccgcgaacaccgc gggcttcggc 1260 gcggtcacca gcatcggcga cgccgggtac ctgcggaccggcgacctggg caccctgtac 1320 gagggccagc tctacgtcac cggccgccgc aaggacatgctggtgctgcg cggccgcaac 1380 tactacccgc aggacatcga gcacgagctg cgggcgcaccaccccgagct ggccgggcgc 1440 gtcggcgcct gcttcgccgt gcgatcgcgc gacggggcgggcggcggcga ggaggtcctc 1500 gtggtcaccc acgaggtgcg cgggatctcc gatccggaccggctgcgtac ccttgccggg 1560 gccatgcggc tcacggtggc ccgggagttc ggcgtgccgagcgctgcggt gctgctgctg 1620 cgccccggcg cggtggcccg taccaccagc ggcaagatccagcgctcggc gatgcgggag 1680 ctgttcgaga ccggcgcgct ggagccggtc ggcggcgaggtggacgaccg gctggtcgcc 1740 accgcggcgc tgggcgcggc ccgatga 1767 34 588 PRTActinomycete 34 Met Val Glu Leu Gly Ser Ala Glu Ser Ile Pro Ala Val LeuArg Arg 1 5 10 15 His Ala Glu Asn Thr Pro Asp Arg Ala Ala His Ala PheVal Thr Asp 20 25 30 Leu Asp Glu Ala Gly Gly Val Ala Trp Leu Ser His AlaGlu Leu Asp 35 40 45 Arg Arg Ala Arg Ala Val Ala Ala Gln Leu Ser Ala HisAla Ala Pro 50 55 60 Gly Asp Arg Met Leu Leu Leu His Pro Ala Gly Pro AspPhe Leu Ile 65 70 75 80 Ala Leu Leu Gly Cys Leu His Ala Gly Leu Ile AlaVal Pro Ser Pro 85 90 95 Leu Pro Gly Arg Tyr Ala His Gln Arg Arg Arg ValArg Leu Ile Ala 100 105 110 Ala Asp Ala Asp Val Thr Ala Val Leu Thr AspArg Ala Thr Arg Ala 115 120 125 Glu Val Val Glu Trp Ala Ala Glu Gln GlyLeu Pro Asp Ile Ala Val 130 135 140 Leu Thr Pro Asp Pro Glu Ala Asp ProGly Asp Trp Gln Pro Pro Pro 145 150 155 160 Leu Ser Arg Asp Thr Val AlaVal Leu Gln Tyr Thr Ser Gly Ser Thr 165 170 175 Gly Asn Pro Lys Gly ValVal Ile Asp His Gly Asn Ile Leu Ser Asn 180 185 190 Ala Ala Thr Ile IleAla Val Thr Gly Ile Arg Pro Gly Thr Val Ile 195 200 205 Gly Gly Trp LeuPro His Phe His Asp Met Gly Leu Met Gly Leu Leu 210 215 220 Leu Pro ProLeu Leu Ala Gly Ala Thr Thr Val Leu Ser Ser Pro Val 225 230 235 240 SerPhe Leu Lys Arg Pro Leu Ser Trp Leu Arg Met Ile Asp Arg Tyr 245 250 255Gly Val Glu Ile Thr Ala Ala Pro Asp Phe Ala Tyr Asp Leu Cys Val 260 265270 Ala Lys Val Thr Asp Ala Glu Leu Ala Thr Leu Asp Leu Ser Arg Trp 275280 285 Arg Val Ala Ile Asn Gly Ser Glu Pro Val Arg Ala Ala Val Leu Thr290 295 300 Arg Phe Arg Gln Arg Phe Ala Ala Ala Gly Leu Arg Pro Glu ValLeu 305 310 315 320 Thr Pro Ser Phe Gly Met Ala Glu Ala Thr Leu Phe ValSer Gly Asp 325 330 335 Pro Ala Thr Pro Phe Val Val Arg Arg Val Asp ThrAsp Arg Leu Ala 340 345 350 Arg His Arg Phe Glu Pro Ala Pro Asp Gly GlyPro Gly Arg Asp Val 355 360 365 Val Ala Cys Gly Ala Pro Ala Gly Val GluVal Arg Ile Val Asp Pro 370 375 380 Gly Ser Gly Asp Pro Leu Pro Asp GlyAla Val Gly Glu Ile Trp Leu 385 390 395 400 Arg Gly Pro Ser Ile Gly ArgGly Tyr Trp Gly Arg Ala Ala Asn Thr 405 410 415 Ala Gly Phe Gly Ala ValThr Ser Ile Gly Asp Ala Gly Tyr Leu Arg 420 425 430 Thr Gly Asp Leu GlyThr Leu Tyr Glu Gly Gln Leu Tyr Val Thr Gly 435 440 445 Arg Arg Lys AspMet Leu Val Leu Arg Gly Arg Asn Tyr Tyr Pro Gln 450 455 460 Asp Ile GluHis Glu Leu Arg Ala His His Pro Glu Leu Ala Gly Arg 465 470 475 480 ValGly Ala Cys Phe Ala Val Arg Ser Arg Asp Gly Ala Gly Gly Gly 485 490 495Glu Glu Val Leu Val Val Thr His Glu Val Arg Gly Ile Ser Asp Pro 500 505510 Asp Arg Leu Arg Thr Leu Ala Gly Ala Met Arg Leu Thr Val Ala Arg 515520 525 Glu Phe Gly Val Pro Ser Ala Ala Val Leu Leu Leu Arg Pro Gly Ala530 535 540 Val Ala Arg Thr Thr Ser Gly Lys Ile Gln Arg Ser Ala Met ArgGlu 545 550 555 560 Leu Phe Glu Thr Gly Ala Leu Glu Pro Val Gly Gly GluVal Asp Asp 565 570 575 Arg Leu Val Ala Thr Ala Ala Leu Gly Ala Ala Arg580 585 35 2169 DNA Streptomyces fradiae 35 ttgaccgtcc gggcggagcaccggaaagcg tcgaccctgc cgccggggaa cccggccgtc 60 agcagcggcg actccgcgtcccgccgggag aagagggccg ccgctgggag cagttcttcg 120 gcggacccgc tggccggcccccacctggtg gccgcgatct ccgcgacggc cgaggccgac 180 ccggggcgca aggccgtcggtctcgtccgg gatccggagc gcgagggcga ggaggcgctg 240 cggagctacg cctggctcgacgacaccgcc cgccgcatcg ccgtcctcct gcgtgcggcc 300 gggctggaaa cgggcgcacgcgtgctgctg ctcttcccgc agtccgcgga gttcgcggcg 360 gcctacgccg ggtgtctctacgcgggcatg gttgccgtcc ccgcgcccct tccgaccggc 420 acctcccatg aggccgcacgcgtcgtcggc atcgcgaagg actccgaggc aggcgccgtc 480 ctcaccgtct ccgaaaccgaggcggacgtc cggcaatggg cggcccgcac cggcctgggc 540 gcgctgcccc tccactgcgtcgacgaactg cccggcgacg ccgaccccga cacgtggcgg 600 gaaccggaga tccgggccgacaccgtggcg gtcctccagt acacctccgg ctccaccggc 660 agccccaagg gggtcgtcgtcacccacggc gcgctcgccg acaacgtgcg cagcctgctc 720 acgggcttcg atctgggatccggcgcccgg ctgggcggct ggctgccgat gtaccacgac 780 atggggctgt tcggcctgctgagcccggca ctgttcagcg gcggagccgc cgtgctgatg 840 agcggcagcg ccttcctgcgccgcccgtcc cagtggctga ggctgatcga ccgcttcggc 900 ctcgtcttct cggcggcgcccgacttcgcc tacgactact gcgtacggcg ggtgagaccc 960 gaggagacgg acgggctcgacctgtcgcgc tggcgctggg cggccaacgg ctccgagccc 1020 atccgcgccg agacgctgcgcgccttcgcc aaggagttcg ccccggccgg actccacccg 1080 aacgccacca ccccttgctacggactggcc gaggcgaccc tgctggtgtc cctgcccacg 1140 ggtgagctgc gcacccgacgggtggacgtc gcggaactgg agaaccaccg cttcgtcgaa 1200 gcggccgtgg gacgcccctcccgcgagatc gtgtcctgcg gccggccccc gtccctggag 1260 atccgcgtcg tcgaccccgcgaccggcaag tccgtcacgg gcggcgacgg agccggcgag 1320 accagggtgg gcgagatcagagtgcgcggc gcgagcgtcg ccaggggcta ctggcagaaa 1380 ccggaggcga ccgccgagacgttcgtcatg gacgcggacg gctccgggcc ctggctgcgc 1440 accggcgacc tcggcgctctgtacgagggc gagctgtacg tcaccggccg tatcaaggaa 1500 ctcctcatcg tgcacggccgcaacatctac ccccatgaca tcgagcacga actgcgcgcc 1560 cgccacgccg aactcggcgctgtcggggcc gccttctccc tcagcaccga atcgggcgag 1620 gttgtggtcg tcacccatgaggtgaacccc accgtccggc ccgagcaggg tcccgagctg 1680 gtgaccgccc tgcgtgcgacgctcgcgcgg gagttcggcc tcgccccggc cggggtggtg 1740 ctggtgcgcc gcggccgcatcccgcgcacc agcagcggca aggtgcaacg ccgcctgacc 1800 gcccggctgt tcagcacgggggaactcgcc caggtccatg ccgaccccgg cgcccaccgc 1860 ctcctggcgg aactcagggaggcgcacgac cgcggcggcg ccttcccgcc cccctccccg 1920 cccgccagcc aggaccccgaggccctgcgg cagcggctgc gcgagctgtg cgccgactgt 1980 ctcggcgtcc ccgtggactccctcgccacg gacgcccccc tcaccgacta cgggatgacc 2040 tccgtcaccg gcaccgccctgtgcgggatg gtggaggagt acctggacgt cgaatgcgac 2100 ctggaactgc tctggcaggagccgacgatc gacgggctcg cctcccggct ggcctcgcgc 2160 accgtgcgc 2169 36 723PRT Streptomyces fradiae 36 Leu Thr Val Arg Ala Glu His Arg Lys Ala SerThr Leu Pro Pro Gly 1 5 10 15 Asn Pro Ala Val Ser Ser Gly Asp Ser AlaSer Arg Arg Glu Lys Arg 20 25 30 Ala Ala Ala Gly Ser Ser Ser Ser Ala AspPro Leu Ala Gly Pro His 35 40 45 Leu Val Ala Ala Ile Ser Ala Thr Ala GluAla Asp Pro Gly Arg Lys 50 55 60 Ala Val Gly Leu Val Arg Asp Pro Glu ArgGlu Gly Glu Glu Ala Leu 65 70 75 80 Arg Ser Tyr Ala Trp Leu Asp Asp ThrAla Arg Arg Ile Ala Val Leu 85 90 95 Leu Arg Ala Ala Gly Leu Glu Thr GlyAla Arg Val Leu Leu Leu Phe 100 105 110 Pro Gln Ser Ala Glu Phe Ala AlaAla Tyr Ala Gly Cys Leu Tyr Ala 115 120 125 Gly Met Val Ala Val Pro AlaPro Leu Pro Thr Gly Thr Ser His Glu 130 135 140 Ala Ala Arg Val Val GlyIle Ala Lys Asp Ser Glu Ala Gly Ala Val 145 150 155 160 Leu Thr Val SerGlu Thr Glu Ala Asp Val Arg Gln Trp Ala Ala Arg 165 170 175 Thr Gly LeuGly Ala Leu Pro Leu His Cys Val Asp Glu Leu Pro Gly 180 185 190 Asp AlaAsp Pro Asp Thr Trp Arg Glu Pro Glu Ile Arg Ala Asp Thr 195 200 205 ValAla Val Leu Gln Tyr Thr Ser Gly Ser Thr Gly Ser Pro Lys Gly 210 215 220Val Val Val Thr His Gly Ala Leu Ala Asp Asn Val Arg Ser Leu Leu 225 230235 240 Thr Gly Phe Asp Leu Gly Ser Gly Ala Arg Leu Gly Gly Trp Leu Pro245 250 255 Met Tyr His Asp Met Gly Leu Phe Gly Leu Leu Ser Pro Ala LeuPhe 260 265 270 Ser Gly Gly Ala Ala Val Leu Met Ser Gly Ser Ala Phe LeuArg Arg 275 280 285 Pro Ser Gln Trp Leu Arg Leu Ile Asp Arg Phe Gly LeuVal Phe Ser 290 295 300 Ala Ala Pro Asp Phe Ala Tyr Asp Tyr Cys Val ArgArg Val Arg Pro 305 310 315 320 Glu Glu Thr Asp Gly Leu Asp Leu Ser ArgTrp Arg Trp Ala Ala Asn 325 330 335 Gly Ser Glu Pro Ile Arg Ala Glu ThrLeu Arg Ala Phe Ala Lys Glu 340 345 350 Phe Ala Pro Ala Gly Leu His ProAsn Ala Thr Thr Pro Cys Tyr Gly 355 360 365 Leu Ala Glu Ala Thr Leu LeuVal Ser Leu Pro Thr Gly Glu Leu Arg 370 375 380 Thr Arg Arg Val Asp ValAla Glu Leu Glu Asn His Arg Phe Val Glu 385 390 395 400 Ala Ala Val GlyArg Pro Ser Arg Glu Ile Val Ser Cys Gly Arg Pro 405 410 415 Pro Ser LeuGlu Ile Arg Val Val Asp Pro Ala Thr Gly Lys Ser Val 420 425 430 Thr GlyGly Asp Gly Ala Gly Glu Thr Arg Val Gly Glu Ile Arg Val 435 440 445 ArgGly Ala Ser Val Ala Arg Gly Tyr Trp Gln Lys Pro Glu Ala Thr 450 455 460Ala Glu Thr Phe Val Met Asp Ala Asp Gly Ser Gly Pro Trp Leu Arg 465 470475 480 Thr Gly Asp Leu Gly Ala Leu Tyr Glu Gly Glu Leu Tyr Val Thr Gly485 490 495 Arg Ile Lys Glu Leu Leu Ile Val His Gly Arg Asn Ile Tyr ProHis 500 505 510 Asp Ile Glu His Glu Leu Arg Ala Arg His Ala Glu Leu GlyAla Val 515 520 525 Gly Ala Ala Phe Ser Leu Ser Thr Glu Ser Gly Glu ValVal Val Val 530 535 540 Thr His Glu Val Asn Pro Thr Val Arg Pro Glu GlnGly Pro Glu Leu 545 550 555 560 Val Thr Ala Leu Arg Ala Thr Leu Ala ArgGlu Phe Gly Leu Ala Pro 565 570 575 Ala Gly Val Val Leu Val Arg Arg GlyArg Ile Pro Arg Thr Ser Ser 580 585 590 Gly Lys Val Gln Arg Arg Leu ThrAla Arg Leu Phe Ser Thr Gly Glu 595 600 605 Leu Ala Gln Val His Ala AspPro Gly Ala His Arg Leu Leu Ala Glu 610 615 620 Leu Arg Glu Ala His AspArg Gly Gly Ala Phe Pro Pro Pro Ser Pro 625 630 635 640 Pro Ala Ser GlnAsp Pro Glu Ala Leu Arg Gln Arg Leu Arg Glu Leu 645 650 655 Cys Ala AspCys Leu Gly Val Pro Val Asp Ser Leu Ala Thr Asp Ala 660 665 670 Pro LeuThr Asp Tyr Gly Met Thr Ser Val Thr Gly Thr Ala Leu Cys 675 680 685 GlyMet Val Glu Glu Tyr Leu Asp Val Glu Cys Asp Leu Glu Leu Leu 690 695 700Trp Gln Glu Pro Thr Ile Asp Gly Leu Ala Ser Arg Leu Ala Ser Arg 705 710715 720 Thr Val Arg 37 273 DNA Actinoplanes sp. 37 atgtccgaga ccgacctgtccgccgcccgg cacacgcccg agcagatccg ctcctggctg 60 atcgaccgga tcgcctactacgtgatgctg ccgacccagg agatcgagcc ggacgtgtcc 120 ctggccgagt acggcctggactcggtgtac gcgttcgcgc tctgcggcga gatcgaggac 180 acgctcggca tcccgatcgagccgaccctg ctgtgggacg tcgacaccgt cgccaccctc 240 accgcccacc tcgccgaccgcgtcaaccga taa 273 38 90 PRT Atinoplanes sp. 38 Met Ser Glu Thr Asp LeuSer Ala Ala Arg His Thr Pro Glu Gln Ile 1 5 10 15 Arg Ser Trp Leu IleAsp Arg Ile Ala Tyr Tyr Val Met Leu Pro Thr 20 25 30 Gln Glu Ile Glu ProAsp Val Ser Leu Ala Glu Tyr Gly Leu Asp Ser 35 40 45 Val Tyr Ala Phe AlaLeu Cys Gly Glu Ile Glu Asp Thr Leu Gly Ile 50 55 60 Pro Ile Glu Pro ThrLeu Leu Trp Asp Val Asp Thr Val Ala Thr Leu 65 70 75 80 Thr Ala His LeuAla Asp Arg Val Asn Arg 85 90 39 270 DNA Streptomyces roseosporus 39atgaacccgc ccgaagcggt cagcacgccc agcgaggtca ccgcgtggat caccggacag 60atcgccgagt tcgtgaacga gacacccgac cggatcgccg gtgacgcacc cctgaccgac 120catggcctcg actccgtctc cggagttgcc ctctgcgcgc aggtcgagga ccgctacggg 180atcgaggtcg acccggagct gctgtggagc gtccccacac tcaacgagtt cgtccaggca 240ctgatgcccc agttggccga ccgcacctga 270 40 89 PRT Streptomyces roseosporus40 Met Asn Pro Pro Glu Ala Val Ser Thr Pro Ser Glu Val Thr Ala Trp 1 510 15 Ile Thr Gly Gln Ile Ala Glu Phe Val Asn Glu Thr Pro Asp Arg Ile 2025 30 Ala Gly Asp Ala Pro Leu Thr Asp His Gly Leu Asp Ser Val Ser Gly 3540 45 Val Ala Leu Cys Ala Gln Val Glu Asp Arg Tyr Gly Ile Glu Val Asp 5055 60 Pro Glu Leu Leu Trp Ser Val Pro Thr Leu Asn Glu Phe Val Gln Ala 6570 75 80 Leu Met Pro Gln Leu Ala Asp Arg Thr 85 41 273 DNA Streptomycesghanaensis 41 atggaagcag cacagactcc ccgaacggcc gccgaactcg gcgactggctcacccgcacg 60 gtggccgact acgtcaggtg tgatccggcg gagatcgacc cggacgtgccgctgtccgat 120 tacggcctcg actcgatctc ggcgaccacg gtgtgtgccg acatcgaggaccacttcggt 180 ctgcccgtcg aagtgacgct gatctgggac caccccacga taggcaaactgtcgcaggca 240 ctggccgagg agctcgaaac cgccgtccgc tga 273 42 90 PRTStreptomyces ghanaensis 42 Met Glu Ala Ala Gln Thr Pro Arg Thr Ala AlaGlu Leu Gly Asp Trp 1 5 10 15 Leu Thr Arg Thr Val Ala Asp Tyr Val ArgCys Asp Pro Ala Glu Ile 20 25 30 Asp Pro Asp Val Pro Leu Ser Asp Tyr GlyLeu Asp Ser Ile Ser Ala 35 40 45 Thr Thr Val Cys Ala Asp Ile Glu Asp HisPhe Gly Leu Pro Val Glu 50 55 60 Val Thr Leu Ile Trp Asp His Pro Thr IleGly Lys Leu Ser Gln Ala 65 70 75 80 Leu Ala Glu Glu Leu Glu Thr Ala ValArg 85 90 43 300 DNA Streptomyces refuineus 43 atgtccctgt ccccgccttcttcgtccccg ccttcttccc cgcccccttc tccgccgcac 60 gaccccgacg ccctgcggcagtggctgcgc gagcagtgcg ccgactgcct cggcgtcccc 120 ccggcatccc tcgccaccgacgtccccctc accgactacg gcatgacctc cgtcaccggg 180 accgccctgt gcggcatggtggaggaccac ctggacgtcg agtgcgacct gagcctgctc 240 tggcaggagc agacgatcgacggcatcacc tcccggctgg cctcgcgcac cgcgcgctga 300 44 99 PRT Streptomycesrefuineus 44 Met Ser Leu Ser Pro Pro Ser Ser Ser Pro Pro Ser Ser Pro ProPro 1 5 10 15 Ser Pro Pro His Asp Pro Asp Ala Leu Arg Gln Trp Leu ArgGlu Gln 20 25 30 Cys Ala Asp Cys Leu Gly Val Pro Pro Ala Ser Leu Ala ThrAsp Val 35 40 45 Pro Leu Thr Asp Tyr Gly Met Thr Ser Val Thr Gly Thr AlaLeu Cys 50 55 60 Gly Met Val Glu Asp His Leu Asp Val Glu Cys Asp Leu SerLeu Leu 65 70 75 80 Trp Gln Glu Gln Thr Ile Asp Gly Ile Thr Ser Arg LeuAla Ser Arg 85 90 95 Thr Ala Arg 45 276 DNA Streptomyces aizunensis 45atgtccgaca tcaccgcccc cccggccacg gccgatgccg cagaggtccg cacctggctg 60cgcgaatgcg tggccaccta cgtccggctt cccgccgagg acatcgacgt gaacctgccg 120ctgtccgagt acggtctcga ctccgtgtac gtgctcagcc tgtgcgccga catcgaggac 180cgctacggca tcgaggtcga gcccaccctg ctctgggacc accccgccat cggcccgatc 240gccgacgcgc tgaccccgct gctcgccgct cgctag 276 46 91 PRT Streptomycesaizunensis 46 Met Ser Asp Ile Thr Ala Pro Pro Ala Thr Ala Asp Ala AlaGlu Val 1 5 10 15 Arg Thr Trp Leu Arg Glu Cys Val Ala Thr Tyr Val ArgLeu Pro Ala 20 25 30 Glu Asp Ile Asp Val Asn Leu Pro Leu Ser Glu Tyr GlyLeu Asp Ser 35 40 45 Val Tyr Val Leu Ser Leu Cys Ala Asp Ile Glu Asp ArgTyr Gly Ile 50 55 60 Glu Val Glu Pro Thr Leu Leu Trp Asp His Pro Ala IleGly Pro Ile 65 70 75 80 Ala Asp Ala Leu Thr Pro Leu Leu Ala Ala Arg 8590 47 267 DNA Actinomycete 47 atgtccgacc tcaccgccgt ccccacgccggagagcctgc gcgcctggct cgtcgactgc 60 gtcgccgacc acctcggccg cgcgcccgccggcatcgcca ccgacgtgcc gctgaccacg 120 tacggcctgg actccgtcta cgcgttgtcgatcgccgcgg agctcgagga ccacctggac 180 gtctcgctcg atccgaccct gatctgggaccacccgacga tcgacgccct cagcgcggcc 240 ctggtggccg agctgcgttc cgcctga 26748 88 PRT Actinomycete 48 Met Ser Asp Leu Thr Ala Val Pro Thr Pro GluSer Leu Arg Ala Trp 1 5 10 15 Leu Val Asp Cys Val Ala Asp His Leu GlyArg Ala Pro Ala Gly Ile 20 25 30 Ala Thr Asp Val Pro Leu Thr Thr Tyr GlyLeu Asp Ser Val Tyr Ala 35 40 45 Leu Ser Ile Ala Ala Glu Leu Glu Asp HisLeu Asp Val Ser Leu Asp 50 55 60 Pro Thr Leu Ile Trp Asp His Pro Thr IleAsp Ala Leu Ser Ala Ala 65 70 75 80 Leu Val Ala Glu Leu Arg Ser Ala 85

1. An isolated polynucleotide encoding an acyl-specific C-domain,wherein said isolated polynucleotide encodes a polypeptide whichcomprises at least 45% sequence identity to at least one sequenceselected from SEQ ID NOS: 1 and
 2. 2. An isolated polynucleotidecomprising a sequence selected from the group consisting of: (a) asequence selected from the group consisting of SEQ ID NOS: 5, 7, 9, 11,13, 15, 17 and 19; (b) a sequence that is complementary to (a); (c) asequence which hybridizes to said sequence of (a) or (b) underconditions of high stringency; and (d) a sequence which has at least 70%or higher homology to said sequence of (a), (b), or (c).
 3. The isolatedpolynucleotide of claim 1, wherein said acyl-specific C-domain isinvolved in lipopeptide acyl-capping.
 4. The isolated polynucleotide ofclaim 3, wherein said isolated polynucleotide resides in a gene locusselected from the group consisting of: (a) the biosynthetic locus forramoplanin from Actinoplanes sp. ATCC 33076; (b) the biosynthetic locusfor A21978C from Streptomyces roseosporus NRRL 11379; (c) thebiosynthetic locus for A54145 from Streptomyces fradiae ATCC 18158; (d)the biosynthetic locus for the calcium-dependent antibiotic fromStreptomyces coelicolor A3(2); (e) the biosynthetic locus for alipopeptide natural product from Streptomyces ghanaensis NRRL B-12104;(f) the biosynthetic locus for a lipopeptide natural product fromStreptomyces refuineus NRRL 3143; (g) the biosynthetic locus for alipopeptide natural product from Streptomyces aizunensis NRRL B-11277;(h) the biosynthetic locus for a lipopeptide natural product fromActinoplanes nipponensis FD 24834 ATCC 31145; and (i) the biosyntheticlocus for a lipopeptide natural product from a Streptomyces sp.organism.
 5. Two or more isolated polynucleotides, wherein the firstpolynucleotide is a polynucleotide of claim 1, and the secondpolynucleotide encodes a polypeptide selected from the group consistingof: (j) a polypeptide having at least 55% sequence identity to SEQ IDNO: 3, and (k) a polypeptide having at least 50% sequence identity toSEQ ID NO:
 4. 6. An isolated polynucleotide comprising a sequenceselected from the group consisting of: (a) a sequence selected from thegroup consisting of SEQ ID NOs. 23, 25, 27, 29, 31, 33, 35, 37, 39, 41,43, 45 and 47; (b) a sequence that is complementary to (a); (c) asequence which hybridizes to said sequence of (a) or (b) underconditions of high stringency, and (d) a sequence which has at least 70%or higher homology to said sequence of (a), (b), or (c).
 7. The isolatedpolynucleotide of claim 6, wherein said isolated polynucleotide residesin a biosynthetic locus selected from the group consisting of: (a) thebiosynthetic locus for ramoplanin from Actinoplanes sp. ATCC 33076; (b)the biosynthetic locus for A21978C from Streptomyces roseosporus NRRL11379; (c) the biosynthetic locus for A54145 from Streptomyces fradiaeATCC 18158; (d) the biosynthetic locus for a lipopeptide natural productfrom Streptomyces ghanaensis NRRL B-12104; (e) the biosynthetic locusfor a lipopeptide natural product from Streptomyces refuineus NRRL 3143;(f) the biosynthetic locus for a lipopeptide natural product fromStreptomyces aizunensis NRRL B-11277; (g) the biosynthetic locus for alipopeptide natural product from Actinoplanes nipponensis FD 24834 ATCC31145; and (h) the biosynthetic locus for a lipopeptide natural productfrom a Streptomyces sp. organism.
 8. An isolated acyl-specific C-domain,encoded by a polynucleotide which comprises a sequence selected from thegroup consisting of: (a) a sequence selected from the group consistingof SEQ ID NOs. 5, 7, 9, 11, 13, 15, 17, 19; and (b) a sequence that iscomplementary to (a); (c) a sequence which hybridizes to said sequenceof (a) or (b) under conditions of high stringency; and (d) a sequencewhich has at least 70% or higher homology to said sequence of (a), (b),or (c).
 9. An isolated acyl-specific C-domain comprising at least 45%sequence homology to at least one sequence selected from SEQ ID NO. 1and SEQ ID NO.
 2. 10. An isolated acyl-specific C-domain comprising apolypeptide sequence selected from the group consisting of: (a) asequence selected from the group consisting of SEQ ID NOs. 6, 8, 10, 12,14, 16, 18, 20 and 22; and (b) a sequence which has at least 70% orhigher homology to said sequence of (a).
 11. Two or more isolatedpolypeptides, wherein the first isolated polypeptide is an acyl-specificC-domain according to claim 9; and the second isolated polypeptide isselected from the group consisting of: (a) a polypeptide having at least55% identity to SEQ ID NO. 3 and (b) a polypeptide having at least 50%identity to SEQ ID NO.
 4. 12. An isolated polypeptide comprising apolypeptide selected from the group consisting of: (a) SEQ ID NOs. 24,26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46 and 48; and (b) a sequencewhich has at least 70% or higher homology to said sequence of (a). 13.An N-acyl-capping cassette comprising at least one acyl-specificC-domain polypeptide and another polypeptide selected from the groupconsisting of an adenylating protein and an acyl-carrier protein.
 14. Acomputer readable medium, comprising: (a) a computer program stored onsaid media containing instructions sufficient to implement a process foreffecting the identification, analysis, or modeling of a representationof a polynucleotide or polypeptide sequence; (b) data stored on saidmedia representing a sequence of a polynucleotide selected from thegroup consisting of: i) a polynucleotide encoding an acyl-specificC-domain, said polynucleotide encoding a polypeptide having at least 45%sequence identity with either SEQ ID NO: 1 or SEQ ID NO: 2; ii) apolynucleotide encoding a polypeptide having at least 55% sequenceidentity with SEQ ID NO: 3; and iii) a polynucleotide encoding apolypeptide having at least 50% sequence identity with SEQ ID NO: 4; and(c) a data structure reflecting the underlying organization andstructure of said data to facilitate said computer program access todata elements corresponding to logical sub-components of the sequence,said data structure being inherent in said program and in the way inwhich said computer program organizes and accesses said data.
 15. Acomputer readable medium, comprising: (a) a computer program stored onsaid media containing instructions sufficient to implement a process foreffecting the identification, analysis, or modeling of a representationof a polypeptide sequence; (b) data stored on said media representing asequence of a polypeptide selected from the group consisting of: i)polypeptide representing an acyl-specific C-domain and having at least45% sequence identity with either SEQ ID NO: 1 or SEQ ID NO: 2; ii) apolypeptide having at least 55% sequence identity with SEQ ID NO: 3; andiii) a polypeptide having at least 50% sequence identity with SEQ ID NO:4 and (c) a data structure reflecting the underlying organization andstructure of said data to facilitate said computer program access todata elements corresponding to logical sub-components of the sequence,said data structure being inherent in said program and in the way inwhich said computer program organizes and accesses said data.
 16. Amemory for storing data that can be accessed by a computer programmed toimplement a process for effecting the identification, analysis, ormodeling of a sequence of a polynucleotide or a polypeptide, said memorycomprising data representing a polynucleotide selected from the groupconsisting of: (a) a polynucleotide encoding an acyl-specific C-domain,said polynucleotide encoding a polypeptide having at least 45% sequenceidentity with either SEQ ID NO: 1 or SEQ ID NO: 2; (b) a polynucleotideencoding a polypeptide having at least 55% sequence identity with SEQ IDNO: 3; and (c) a polynucleotide encoding a polypeptide having at least50% sequence identity with SEQ ID NO:
 4. 17. A memory for storing datathat can be accessed by a computer programmed to implement a process foreffecting the identification, analysis, or modeling of a sequence of apolypeptide, said memory comprising data representing a polypeptideselected from the group consisting of: (a) a polypeptide having at least45% sequence identity with either SEQ ID NO: 1 or SEQ ID NO: 2; (b) apolypeptide having at least 55% sequence identity with SEQ ID NO: 3; and(c) a polypeptide having at least 50% sequence identity with SEQ ID NO:4.
 18. A method for detecting a polypeptide involved in lipopeptidebiosynthesis or a polynucleotide encoding such a polypeptide comprisingthe step of identifying: (a) a polypeptide having at least 45% sequenceidentity to SEQ ID NO:1 or SEQ ID NO:2, or (b) a polynucleotide encodinga polypeptide having at least 45% sequence identity to SEQ ID NO:1 orSEQ ID NO:2, and wherein said at least 45% sequence identity indicates apolypeptide involved in lipopeptide biosynthesis.
 19. A method accordingto claim 18 wherein the identifying step comprising the steps of: (a)providing a reference polynucleotide or polypeptide sequence selectedfrom the group consisting of polynucleotide or polypeptide sequencesrepresenting an acyl-specific domain; (b) comparing said referencesequence to one or more candidate polynucleotide or polypeptidesequences stored on a computer readable medium; (c) determining level ofhomology between said reference sequence and said one or more candidatesequences, and (d) identifying a candidate sequence which shares atleast 70% homology with reference sequence.
 20. The method of claim 19,wherein said reference sequence is a polypeptide of SEQ ID NOS. 6, 8,10, 12, 14, 16, 18, 20, 22 or a polynucleotide encoding a polypeptide ofSEQ ID NOS. 6, 8, 10, 12, 14, 16, 18, 20 or
 22. 21. The method of claim19 further comprising determining structural motifs common to saidcandidate sequence and said reference sequence.
 22. The method of claim18 further comprising the step of identifying, in proximity to thepolypeptide of a) or the polynucleotide of b) at least c) onepolypeptide having at least 55% sequence identity to SEQ ID NO: 3 or onepolynucleotide sequence encoding a polypeptide having at least 55%sequence identity to SEQ ID NO: 3; or d) one polypeptide having at least50% sequence identity to SEQ ID NO: 4 or one polynucleotide sequenceencoding a polypeptide having at least 50% sequence identity to SEQ IDNO:
 4. 23. The method according to claim 22 wherein (a) the polypeptideof c) or d) is a polypeptide of SEQ ID NO: 24, 26, 28, 30, 32, 34, 36,38 or 40, or a polypeptide having at least 70% sequence identity to apolypeptide of SEQ ID NO: 24, 26, 28, 30, 32, 34, 36, 38 or 40; or (b)the nucleotide of c) or d) is a nucleotide encoding a polypeptide of SEQID NO: 24, 26, 28, 30, 32, 34, 36, 38 or 40 or a nucleotide encoding apolypeptide having at least 70% sequence identity to a polypeptide ofSEQ ID NO: 24, 26, 28, 30, 32, 34, 36, 38 or
 40. 24. A computer systemcomprising: (a) a database of reference sequences, wherein the referencesequences encode proteins involved in lipid biosynthesis, and whereinthe reference sequences include one or more of: (i) a polypeptidesequence representing an acyl-specific C-domain or a polynucleotideencoding an acyl-specific C-domain; and (b) a user interface capable of:(i) receiving a test sequence for comparing against each of thereference sequences in the database; and (ii) displaying the results ofthe comparison.
 25. A computer system of claim 24 wherein the referencesequences further include one or more of: (iv) a polypeptide sequencerepresenting an adenylating enzyme or a polynucleotide encoding anadenylating enzyme; and (v) a polypeptide sequence representing an acylcarrier protein or a poynucleotide encoding an acyl carrier protein. 26.A computer system of claim 25 wherein (a) the reference sequence of (i)is selected from SEQ ID NOS: 1, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,15, 16, 17, 18, 19, 20, 21 and 22; (b) the reference sequence of (iv) isselected from SEQ ID NOS: 3, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33and 34; and (c) the reference sequence of (v) is selected from SEQ IDNO: 4, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47 and 48.